[HN Gopher] Ggml.ai joins Hugging Face to ensure the long-term p...
       ___________________________________________________________________
        
       Ggml.ai joins Hugging Face to ensure the long-term progress of
       Local AI
        
       Author : lairv
       Score  : 808 points
       Date   : 2026-02-20 13:51 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | rvz wrote:
       | This acquisition is almost the same as the acquisition of Bun by
       | Anthropic.
       | 
       | Both $0 revenue "companies", but have created software that is
       | essential to the wider ecosystem and has mindshare value; Bun for
       | Javascript and Ggml for AI models.
       | 
       | But of course the VCs needed an exit sooner or later. That was
       | inevitable.
        
         | andsoitis wrote:
         | I believe ggml.ai was funded by angel investors, not VC.
        
       | jimmydoe wrote:
       | Amazing. I like the openness of both project and really excited
       | for them.
       | 
       | Hopefully this does not mean consolidation due to resource dry up
       | but true fusion of the bests.
        
       | mnewme wrote:
       | Huggingface is the silent GOAT of the AI space, such a great
       | community and platform
        
         | lairv wrote:
         | Truly amazing that they've managed to build an open and
         | profitable platform without shady practices
        
           | al_borland wrote:
           | It's such a sad state of affairs when shady practices are so
           | normal that finding a company without them is noteworthy.
        
       | geooff_ wrote:
       | As someone who's been in the "AI" space for a while its strange
       | how Hugging Face went from one of the biggest name to not a part
       | of the discussion at all.
        
         | r_lee wrote:
         | I think that's because there's less local AI usage now since
         | there's all kinds of image models by the big labs, so there's
         | really no rush of people self hosting stable diffusion etc
         | anymore
         | 
         | the space moved from Consumer to Enterprise pretty fast due to
         | models getting bigger
        
           | zozbot234 wrote:
           | Today's free models are not really bigger when you account
           | for the use of MoE (with ever increasing sparsity, meaning a
           | smaller fraction of active parameters), and better ways of
           | managing KV caching. You can do useful things with very
           | little RAM/VRAM, it just gets slower and slower the more you
           | try to squeeze it where it doesn't quite belong. But that's
           | not a problem if you're willing to wait for every answer.
        
             | r_lee wrote:
             | yeah, but I mean more like the old setups where you'd just
             | load a model on a 4090 or something, even with MoE it's a
             | lot more complex and takes more VRAM, right? like it just
             | seems not justifiable for most hobbyists
             | 
             | but maybe I'm just slightly out of the loop
        
               | zozbot234 wrote:
               | With sparse MoE it's worth running the experts in system
               | RAM since that allows you to transparently use mmap and
               | inactive experts can stay on disk. Of course that's also
               | a slowdown unless you have enough RAM for the full set,
               | but it lets you run much larger models on smaller
               | systems.
        
         | LatencyKills wrote:
         | It isn't necessary to be part of the discussion if you are
         | truly adding value (which HF continues to do). It's nice to see
         | a company doing what it does best without constantly driving
         | the hype train.
        
         | segmondy wrote:
         | part of what discussion? anyone in the AI space knows and uses
         | HF, but the public doesn't give a care and why should they?
         | It's just an advanced site were nerds download AI stuff. HF is
         | super valuable with their transformers library, their code,
         | tutorials, smol-models, etc, but how does it translate to
         | investor dollars?
        
       | HanClinto wrote:
       | I'm regularly amazed that HuggingFace is able to make money. It
       | does so much good for the world.
       | 
       | How solid is its business model? Is it long-term viable? Will
       | they ever "sell out"?
        
         | I_am_tiberius wrote:
         | I once tried hugging face because I wanted I worked through
         | some tutorial. They wanted my credit card details during the
         | registration as far as I remember. After a month they invoiced
         | me some amount of money and I had no idea what it was. To be
         | honest, I don't understand what exactly they do and what
         | services I was paying for, but I cancelled my account and never
         | touched it again. For me that was a totally intransparent
         | process.
        
           | shafyy wrote:
           | Their pricing seems pretty transparent:
           | https://huggingface.co/pricing
        
         | dmezzetti wrote:
         | They have paid hosting - https://huggingface.co/enterprise and
         | paid accounts. Also consulting services. Seems like a pretty
         | good foundation to me.
        
           | julien_c wrote:
           | and a lot of traction on paid (private in particular) storage
           | these days; sneak peek at new landing page:
           | https://huggingface.co/storage
        
         | microsoftedging wrote:
         | FT had a solid piece a few weeks back: "Why AI start-up Hugging
         | Face turned down a $500mn Nvidia deal"
         | 
         | https://giftarticle.ft.com/giftarticle/actions/redeem/9b4eca...
        
           | jackbravo wrote:
           | sounds very interesting, but even though it says
           | giftarticle.ft, I got blocked by a paywall.
        
             | nerevarthelame wrote:
             | https://archive.is/zSyUc
             | 
             | To summarize, they rejected Nvidia's offer because they
             | didn't want one outsized investor who could sway decisions.
             | And "the company was also able to turn down Nvidia due to
             | its stable finances. Hugging Face operates a 'freemium'
             | business model. Three per cent of customers, usually large
             | corporations, pay for additional features such as more
             | storage space and the ability to set up private
             | repositories."
        
               | bee_rider wrote:
               | Freemium seems to be working pretty well for them--what's
               | the alternative website, after all. They seem to command
               | their niche.
        
             | culi wrote:
             | find the Bypass Paywalls Clean extension. Never worry about
             | a paywall again
        
         | heliumtera wrote:
         | >Will they ever "sell out"?
         | 
         | Oh no, never. Don't worry, the usual investors are very well
         | known for fighting for user autonomy (AMD, Nvidia, Intel,IBM,
         | Qualcomm)
         | 
         | They are all very pro consumers and all backers are certainly
         | here for your enjoyment only
        
           | zozbot234 wrote:
           | These are all big hardware firms, which makes a lot of sense
           | as a classic 'commoditize the complement' play. Not exactly
           | pro-consumer, but not quite anti-consumer either!
        
             | 5o1ecist wrote:
             | > AMD, Nvidia, Intel, IBM, Qualcomm
             | 
             | > but not quite anti-consumer either!
             | 
             | All of them are public companies, which means that their
             | default state is anti-consumer and pro-shareholder. By law
             | they are required to do whatever they can to maximize
             | profits. History teaches that shareholders can demand
             | whatever they want, with the respective companies following
             | orders, since nobody ever really has to suffer consequences
             | and any and all potential fines are already priced in, in
             | advance, anyway.
             | 
             | Conversely, this is why Valve is such a great company.
             | Valve is probably one of the only few actual pro-consumer
             | companies out there.
             | 
             | Fun Fact! Rarely is it ever mentioned anywhere, but Valve
             | is not a public company! Valve is a private company! That's
             | why they can operate the way they do! If Valve was a public
             | company, then greedy, crooked billionaire shareholders
             | would have managed to get rid of Gabe a long time ago.
        
               | HanClinto wrote:
               | Great points.
               | 
               | Valve is one of my top favorite companies right now. Love
               | the work they're doing, and their products are amazing.
               | 
               | Can hardly wait for the Steam Frame.
        
               | RussianCow wrote:
               | > By law they are required to do whatever they can to
               | maximize profits.
               | 
               | I know it's a nit-pick, but I hate that this always gets
               | brought up when it's not actually true. Public
               | corporations face pressure from investors to maximize
               | returns, sure, but there is no law stating that they have
               | to maximize profits at all costs. Public companies can
               | (and often do) act against the interest of immediate
               | profits for some other gain. The only real leverage that
               | investors have is the board's ability to fire executives,
               | but that assumes that they have the necessary votes to do
               | so. As a counter-example, Mark Zuckerberg still controls
               | the majority of voting power at Meta, so he can
               | effectively do whatever he wants with the company without
               | major consequence (assuming you don't consider stock
               | price fluctuations "major").
               | 
               | But I say this not to take away from your broader point,
               | which I agree with: the short-term profit-maximizing
               | culture is indeed the default when it comes to publicly
               | traded corporations. It just isn't something inherent in
               | being publicly traded, and in the inverse, private
               | companies often have the same kind of culture, so that's
               | not a silver bullet either.
        
               | 5o1ecist wrote:
               | You're perfectly right and I don't consider it a nitpick.
               | I really should be more precise about this, instead of
               | spreading inaccuracies. Thank you!
        
               | chucksmash wrote:
               | It's a worthwhile point to make because if people believe
               | that misconception then it lets companies wash their
               | hands of flagrantly bad behavior. "Gosh, we should really
               | get around to changing the law that makes them act that
               | way."
        
             | smallerize wrote:
             | heliumtera is being sarcastic.
        
         | bityard wrote:
         | Their business model is essentially the same as GitHub. Host
         | lots of stuff for free and build a community around it, sell
         | the upscaled/private version to businesses. They are already
         | profitable.
        
           | HanClinto wrote:
           | This is what Sourceforge did too, and they still had the
           | DevShare adware thing didn't they?
           | 
           | GitHub is great -- huge fan. To some degree they "sold out"
           | to Microsoft and things could have gone more south, but
           | thankfully Microsoft has ruled them with a very kind hand,
           | and overall I'm extremely happy with the way they've handled
           | it.
           | 
           | I guess I always retain a bit of skepticism with such things,
           | and the long-term viability and goodness of such things never
           | feels totally sure.
        
       | dmezzetti wrote:
       | This is really great news. I've been one of the strongest
       | supporters of local AI dedicating thousands of hours towards
       | building a framework to enable it. I'm looking forward to seeing
       | what comes of it!
        
         | logicallee wrote:
         | >I've been one of the strongest supporters of local AI,
         | dedicating thousands of hours towards building a framework to
         | enable it.
         | 
         | Sounds like you're very serious about supporting local AI. I
         | have a query for you (and anyone else who feels like donating)
         | about whether you'd be willing to donate some memory/bandwidth
         | resources p2p to hosting an offline model:
         | 
         | We have a local model we would like to distribute but don't
         | have a good CDN.
         | 
         | As a user/supporter question, would you be willing to donate
         | some spare memory/bandwidth in a simple dedicated browser tab
         | you keep open on your desktop that plays silent audio (to not
         | be put in the background and deloaded) and then allocates 100mb
         | -1 gb of RAM and acts as a webrtc peer, serving checksumed
         | models?[1] (Then our server only has to check that you still
         | have it from time to time, by sending you some salt and a part
         | of the file to hash and your tab proves it still has it by
         | doing so). This doesn't require any trust, and the receiving
         | user will also hash it and report if there's a mismatch.
         | 
         | Our server federates the p2p connections, so when someone
         | downloads they do so from a trusted peer (one who has
         | contributed and passed the audits) like you. We considered
         | building a binary for people to run but we consider that people
         | couldn't trust our binaries, or would target our build process
         | somehow, we are paranoid about trust, whereas a web model is
         | inherently untrusted and safer. Why do all this?
         | 
         | The purpose of this would be to host an offline model: we
         | successfully ported a 1 GB model from C++ and Python to WASM
         | and WebGPU (you can see Claude doing so here, we livestreamed
         | some of it[2]), but the model weights at 1 GB are too much for
         | us to host.
         | 
         | Please let us know whether this is something you would
         | contribute a background tab to hosting on your desktop. It
         | wouldn't impact you much and you could set how much memory to
         | dedicate to it, but you would have the good feeling of knowing
         | that you're helping people run a trusted offline model if they
         | want - from their very own browser, no download required. The
         | model we ported is fast enough for anyone to run on their own
         | machines. Let me know if this is something you'd be willing to
         | keep a tab open for.
         | 
         | [1] filesharing over webrtc works like this:
         | https://taonexus.com/p2pfilesharing/ you can try it in 2
         | browser tabs.
         | 
         | [2] https://www.youtube.com/watch?v=tbAkySCXyp0and and some
         | other videos
        
           | liuliu wrote:
           | > We have a local model we would like to distribute but don't
           | have a good CDN.
           | 
           | That is not true. I am serving models off Cloudflare R2. It
           | is 1 petabyte per month in egress use and I basically pay
           | peanuts (~$200 everything included).
        
             | logicallee wrote:
             | 1 petabyte per month is 1 million downloads of a 1 GB file.
             | We intend to scale to more than 1 million downloads per
             | month. We have a specific scaling architecture in mind.
             | We're qualified to say this because we've ported a billion
             | parameter model to run in your browser - fast - on either
             | webgpu or wasm. (You can see us doing it live at the
             | youtube link in my comment above.) There is a lot of demand
             | for that.
        
               | liuliu wrote:
               | The bandwidth is free on Cloudflare R2. I paid money for
               | storage (~10TiB storage of different models). If you only
               | host 1GiB file there, you are only paying $0.01 per month
               | I believe.
        
               | dirasieb wrote:
               | how about you work on achieving 1 million downloads per
               | month first? talk about putting the horse before the
               | carriage
        
           | echoangle wrote:
           | Maybe stupid question but why not just put it in a torrent?
        
             | logicallee wrote:
             | Torrents require users to download and install a torrent
             | client! In addition, we would like to retain the
             | possibility of giving live updates to the latest version of
             | a sovereign fine-tuned file, torrents don't autoupdate. We
             | want to keep improving what people get.
             | 
             | Finally, we would like the possibility of setting up market
             | dynamics in the future: if you aren't currently using all
             | your ram, why not rent it out? This matches the p2p edge
             | architecture we envision.
             | 
             | In addition, our work on WebGPU would allow you to rent out
             | your gpu to a background tab whenever you're not using it.
             | Why have all that silicon sit idle when you could rent it
             | out?
             | 
             | You could also donate it to help fine tune our own
             | sovereign model.
             | 
             | All of this will let us bootstrap to the point where we
             | could be trusted with a download.
             | 
             | We have a rather paranoid approach to security.
        
             | liuliu wrote:
             | It is very simple. Storage / bandwidth is not expensive.
             | Residential bandwidth is. If you can convince people to
             | install a bandwidth-related software on their residential
             | homes, you can then charge other people $5 to $10 per 1GiB
             | bandwidth (useful for botnet mostly, get around DDOS
             | protections and other reCAPTCHA tasks).
        
               | logicallee wrote:
               | Thank you for your suggestion. Below is only our
               | plans/intentions, we welcome feedback about it:
               | 
               | We are not going to do what you suggest. Instead, our
               | approach is to use the RAM people aren't using at the
               | moment for a fast edge cache close to their area.
               | 
               | We've tried this architecture and get very low latency
               | and high bandwidth. People would not be contributing
               | their resources to anything they don't know about.
        
           | HanClinto wrote:
           | Hosting model weights for projects like this I think is
           | something that you could upload to a space in Hugging Face?
           | 
           | What services would you need that Hugging Face doesn't
           | provide?
        
       | beoberha wrote:
       | Seems like a great fit - kinda surprised it didn't happen sooner.
       | I think we are deep in the valley of local AI, but I'd be willing
       | to bet it breaks out in the next 2-3 years. Here's hoping!
        
         | breisa wrote:
         | I mean they already supported the project quite a bit. @ngxson
         | and maybe others? from Huggingface are big contributors to
         | llama.cpp.
        
       | mythz wrote:
       | I consider HuggingFace more "Open AI" than OpenAI - one of the
       | few quiet heroes (along with Chinese OSS) helping bring on-
       | premise AI to the masses.
       | 
       | I'm old enough to remember when traffic was expensive, so I've no
       | idea how they've managed to offer free hosting for so many
       | models. Hopefully it's backed by a sustainable business model, as
       | the ecosystem would be meaningfully worse without them.
       | 
       | We still need good value hardware to run Kimi/GLM in-house, but
       | at least we've got the weights and distribution sorted.
        
         | zozbot234 wrote:
         | > We still need good value hardware to run Kimi/GLM in-house
         | 
         | If you stream weights in from SSD storage and freely use swap
         | to extend your KV cache it will be really slow (multiple
         | seconds per token!) but run on basically anything. And that's
         | still really good for stuff that can be computed overnight,
         | perhaps even by batching many requests simultaneously. It gets
         | progressively better as you add more compute, of course.
        
           | HPsquared wrote:
           | At a certain point the energy starts to cost more than
           | renting some GPUs.
        
             | vardalab wrote:
             | Yeah, that is hard to argue with because I just go to
             | OpenRouter and play around with a lot of models before I
             | decide which ones I like. But there's something special
             | about running it locally in your basement
        
               | dotancohen wrote:
               | I'd love to hear more about this. How do you decide that
               | you like a model? For which use cases?
        
             | fc417fc802 wrote:
             | Aren't decent GPU boxes in excess of $5 per hour? At $0.20
             | per kWhr (which is on the high side in the US) running a 1
             | kW workstation 24/7 would work out to the same price as 1
             | hour of GPU time.
             | 
             | The issue you'll actually run into is that most residential
             | housing isn't wired for more than ~2kW per room.
        
           | Aurornis wrote:
           | > it will be really slow (multiple seconds per token!)
           | 
           | This is fun for proving that it can be done, but that's 100X
           | slower than hosted models and 1000X slower than GPT-Codex-
           | Spark.
           | 
           | That's like going from real time conversation to e-mailing
           | someone who only checks their inbox twice a day if you're
           | lucky.
        
             | zozbot234 wrote:
             | You'd need real rack-scale/datacenter infrastructure to
             | properly match the hosted models that are keeping
             | everything in fast VRAM at all times, and then you only get
             | reasonable utilization on that by serving requests from
             | many users. The ~100X slower tier is totally okay for
             | experimentation and non-conversational use cases (including
             | some that are more agentic-like!), and you'd reach ~10X
             | (quite usable for conversation) by running something like a
             | good homelab.
        
         | data-ottawa wrote:
         | Can we toss in the work unsloth does too as an unsung hero?
         | 
         | They provide excellent documentation and they're often very
         | quick to get high quality quants up in major formats. They're a
         | very trustworthy brand.
        
           | cubie wrote:
           | I'm a big fan of their work as well, good shout.
        
             | danielhanchen wrote:
             | Thank you!
        
           | disiplus wrote:
           | Yeah, they're the good guys. I suspect the open source work
           | is mostly advertisements for them to sell consulting and
           | services to enterprises. Otherwise, the work they do doesn't
           | make sense to offer for free.
        
             | arcanemachiner wrote:
             | I hope that is exactly what is happening. It benefits them,
             | and it benefits us.
        
             | danielhanchen wrote:
             | Haha for now our primary goal is to expand the market for
             | local AI and educate people on how to do RL, fine-tuning
             | and running quants :)
        
               | WanderPanda wrote:
               | Amazing work and people should really appreciate that the
               | opportunity costs of your work are immense (given the
               | hype).
               | 
               | On another note: I'm a bit paranoid about quantization. I
               | know people are not good at discerning model quality at
               | these levels of "intelligence" anymore, I don't think a
               | vibe check really catches the nuances. How hard would it
               | be to systematically evaluate the different
               | quantizations? E.g. on the Aider benchmark that you used
               | in the past?
               | 
               | I was recently trying Qwen 3 Coder Next and there are
               | benchmark numbers in your article but they seem to be for
               | the official checkpoint, not the quantized ones. But it
               | is not even really clear (and chatbots confuse them for
               | benchmarks of the quantized versions btw.)
               | 
               | I think systematic/automated benchmarks would really
               | bring the whole effort to the next level. Basically
               | something like the bar chart from the Dynamic
               | Quantization 2.0 article but always updated with all
               | kinds of recent models.
        
               | Zetaphor wrote:
               | This would be amazing
        
               | danielhanchen wrote:
               | Working on it! :)
        
               | jychang wrote:
               | > How hard would it be to systematically evaluate the
               | different quantizations? E.g. on the Aider benchmark that
               | you used in the past?
               | 
               | Very hard. $$$
               | 
               | The benchmarks are not cheap to run. It'll cost a lot to
               | run them for each quant of each model.
        
               | danielhanchen wrote:
               | Yes sadly very expensive :( Maybe a select few quants
               | could happen - we're still figuring out what is the most
               | economical and most efficient way to benchmark!
        
               | illusive4080 wrote:
               | Roughly how much does it cost to run one of the popular
               | benchmarks? Are we talking $1,000, $10,000, or $100k?
        
               | danielhanchen wrote:
               | Thanks! Yes we actually did think about that - it can get
               | quite expensive sadly - perplexity benchmarks over short
               | context lengths with small datasets are doable, but it's
               | not an accurate measure sadly. We're actually
               | investigating currently what would be the best efficient
               | course of action on evaluating quants - will keep you
               | posted!
        
           | danielhanchen wrote:
           | Oh thank you - appreciate it :)
        
           | swyx wrote:
           | not that unsung! we've given them our biggest workshop spot
           | every single year we've been able to and will do until they
           | are tired of us
           | https://www.youtube.com/@aiDotEngineer/search?query=unsloth
        
             | danielhanchen wrote:
             | Appreciate it immensely haha :) Never tired - always
             | excited and pumped for this year!
        
         | sowbug wrote:
         | Why doesn't HF support BitTorrent? I know about hf-torrent and
         | hf_transfer, but those aren't nearly as accessible as a link in
         | the web UI.
        
           | embedding-shape wrote:
           | > Why doesn't HF support BitTorrent?
           | 
           | Harder to track downloads then. Only when clients hit the
           | tracker would they be able to get download states, and forget
           | about private repositories or the "gated" ones that
           | Meta/Facebook does for their "open" models.
           | 
           | Still, if vanity metrics wasn't so important, it'd be a great
           | option. I've even thought of creating my own torrent mirror
           | of HF to provide as a public service, as eventually access to
           | models will be restricted, and it would be nice to be
           | prepared for that moment a bit better.
        
             | sowbug wrote:
             | I thought of the tracking and gate questions, too, when I
             | vibed up an HF torrent service a few nights ago. (Super
             | annoying BTW to have to download the files just to hash the
             | parts, especially when webseeds exist.) Model owners could
             | disable or gate torrents the same way they gate the models,
             | and HF could still measure traffic by .torrent downloads
             | and magnet clicks.
             | 
             | It's a bit like any legalization question -- the black
             | market exists anyway, so a regulatory framework could bring
             | at least some of it into the sunlight.
        
               | embedding-shape wrote:
               | > Model owners could disable or gate torrents the same
               | way they gate the models, and HF could still measure
               | traffic by .torrent downloads and magnet clicks.
               | 
               | But that'll only stop a small part, anyone could share
               | the infohash and if you're using the dht/magnet without
               | .torrent files or clicks on a website, no one can count
               | those downloads unless they too scrape the dht for peers
               | who are reporting they've completed the download.
        
               | sowbug wrote:
               | Right, but that's already happening today. That's the
               | black-market point.
        
               | fc417fc802 wrote:
               | > unless they too scrape the dht for peers who are
               | reporting they've completed the download.
               | 
               | Which can be falsified. Head over to your favorite
               | tracker and sort by completed downloads to see what I
               | mean.
        
             | taminka wrote:
             | most of the traffic is probably from open weights, just
             | seed those, host private ones as is
        
             | homarp wrote:
             | how are all the private trackers tracking ratios?
        
             | jimbob45 wrote:
             | Wouldn't it still provide massive benefits if they could
             | convince/coerce their most popular downloaded models to
             | move to torrenting?
        
               | intrasight wrote:
               | Benefit to you, but great downside to the three letter
               | agencies that inject their goods into these models.
        
             | Barbing wrote:
             | That would be a very nice service. I think folks might rely
             | on it for a number of reasons, including that we'll want to
             | see how biases changed over time. What got sloppier,
             | shillier...
        
         | Fin_Code wrote:
         | I still don't know why they are not running on torrent. Its the
         | perfect use case.
        
           | heliumtera wrote:
           | How can you be the man in the middle in a truly P2P
           | environment?
        
           | freedomben wrote:
           | That would shut out most people working for big corp, which
           | is probably a huge percentage of the user base. It's dumb,
           | but that's just the way corp IT is (no torrenting allowed).
        
             | zozbot234 wrote:
             | It's a sensible option, even when not everyone can really
             | use it. Linux distros are routinely transfered via torrent,
             | so why not other massive, open-licensed data?
        
               | freedomben wrote:
               | Oh as an option, yeah I agree it makes a ton of sense. I
               | just would expect a very, very small percentage of people
               | to use the torrent over the direct download. With Linux
               | distros, the vast majority of downloads still come from
               | standard web servers. When I download distro images I opt
               | for torrents, but very few people do the same
        
               | zrm wrote:
               | With Linux distros they typically put the web link right
               | on the main page and have a torrent available if you go
               | look for it, because they want you to try their distro
               | more than they want to save some bandwidth.
               | 
               | Suppose HF did the opposite because the bandwidth saved
               | is more and they're not as concerned you might download a
               | different model from someone else.
        
               | Const-me wrote:
               | > very small percentage of people to use the torrent over
               | the direct download
               | 
               | BitTorrent protocol is IMO better for downloading large
               | files. When I want to download something which exceeds
               | couple GB, and I see two links direct download and
               | BitTorrent, I always click on the torrent.
               | 
               | On paper, HTTP supports range requests to resume partial
               | downloads. IME, it seems modern web browsers neglected to
               | implement it properly. They won't resume after browser is
               | reopened, or the computer is restarted. Command-line HTTP
               | clients like wget are more reliable, however many web
               | servers these days require some session cookies or one-
               | time query string tokens, and it's hard to pass that
               | stuff from browser to command-line.
               | 
               | I live in Montenegro, CDN connectivity is not great here.
               | Only a few of them like steam and GOG saturate my 300
               | megabit/sec download link. Others are much slower, e.g.
               | windows updates download at about 100 megabit/sec.
               | BitTorrent protocol almost always delivers the 300
               | megabit/sec bandwidth.
        
               | thot_experiment wrote:
               | I have terabytes of linux isos I got via torrents, many
               | such cases!
        
         | Tepix wrote:
         | It's insane how much traffic HF must be pushing out of the
         | door. I routinely download models that are hundreds of
         | gigabytes in size from them. A fantastic service to the
         | sovererign AI community.
        
           | vardalab wrote:
           | Yup, I have downloaded probably a terabyte in the last week,
           | especially with the Step 3.5 model being released and Minimax
           | quants. I wonder what my ISP thinks. I hope they don't cut me
           | off. They gave me a fast lane, they better let me use it, lol
        
             | fc417fc802 wrote:
             | Even fairly restrictive data caps are in the range of 6 Tb
             | per month. P2P at a mere 100 Mb works out to 1 TiB per 24
             | hours.
             | 
             | Hypothetically my ISP will sell me unmetered 10 Gb service
             | but I wonder if they would actually make good on their word
             | ...
        
               | 3eb7988a1663 wrote:
               | I have a 1.2TB cap before you start getting charged
               | extra, so you might need to recalibrate your restrictive
               | level.
        
               | fc417fc802 wrote:
               | Is that with a WISP by chance? Or in a developing
               | country? Or are there really wired providers with such
               | low caps in the western world in this day and age?
        
               | nagaiaida wrote:
               | well it's my wired cap a stone's throw from buildings
               | with google cloud logos on the side in a major us city,
               | so...
        
               | zargon wrote:
               | Comcast.
        
               | Zetaphor wrote:
               | ATT once told me if I don't pay for their TV service then
               | my home gigabit fiber would have a 1TB cap. They had an
               | agreement with the apartment building so I had no other
               | choice of provider.
        
               | fc417fc802 wrote:
               | Buy our off brand netflix or else we'll make it so you
               | can't watch netflix. How is that legal?
        
               | Zetaphor wrote:
               | The law is written by the highest bidder, and the telecom
               | lobbyists are very generous
        
           | razster wrote:
           | My fear is that these large "AI" companies will lobby to have
           | these open source options removed or banned, growing concern.
           | I'm not sure how else to explain how much I enjoy using what
           | HF provides, I religiously browse their site for new and
           | exciting models to try.
        
             | culi wrote:
             | ModelScope is the Chinese equivalent of Hugging Face and a
             | good back up. All the open models are Chinese anyways
        
               | thot_experiment wrote:
               | Not true! Mistral is really really good, but I agree that
               | there isn't a single decent open model from the USA.
        
               | culi wrote:
               | Mistral is cool and I wish them success but it
               | consistently ranks extremely low on benchmarks while
               | still being expensive. Chinese models like DeepSeek might
               | rank almost as low as Mistral but they are significantly
               | cheaper. And Kimi is the best of both worlds with
               | incredible benchmark results while still being incredibly
               | cheap
               | 
               | I know things change rapidly so I'm not counting them out
               | quite yet but I don't see them as a serious contender
               | currently
        
               | Eupolemos wrote:
               | Why are you talking price when we are talking local AI?
               | 
               | That doesn't make any sense to me. Am I missing
               | something?
        
               | culi wrote:
               | Your electricity is free?
        
               | cpburns2009 wrote:
               | If you have the hardware to run expensive models, is the
               | cost of electricity much of a factor? According to
               | Google, the average price in the Silicon Valley Area is
               | $0.448 per kWh. An RTX 5090 costs about $4,000 and has a
               | peak power consumption of 1000 W. Maxing out that GPU for
               | a whole year would cost $3,925 at that rate. It's not
               | particularly more expensive than that hardware itself.
        
               | culi wrote:
               | At that point it'd be cheaper to get an expensive
               | subscription to a cloud platform AI product. I understand
               | the case for local LLMs but it seems silly to worry about
               | pricing for cloud-based offerings but not worry about
               | pricing for locally run models. Especially since running
               | it locally can often be more expensive
        
               | seanmcdirmid wrote:
               | Apple silicon is crazy efficient as well as being
               | comparable to GPUs in performance for max and ultra
               | chips.
        
               | thot_experiment wrote:
               | for almost the entire year, yes.
        
               | dirasieb wrote:
               | 15 missed calls from your local power company
        
               | thot_experiment wrote:
               | Sure, benchmarks are fake and I use Mistral over
               | equivalently sized models most of the time because it's
               | better in real life. It runs plenty fast for me, I don't
               | pay for inference.
        
               | BoredomIsFun wrote:
               | > it consistently ranks extremely low on benchmarks
               | 
               | As general purpose chatbots small Mistral models are
               | better than comparably sized Chiniese models, as they
               | have better SimpleQA scores and general knowledge of
               | Western culture.
        
               | seanmcdirmid wrote:
               | It's really hard to beat qwen coder, especially for role
               | play where the instruction following is really useful. I
               | don't think their corpus is lacking in western knowledge,
               | although I wonder if Chinese users get even better
               | results from it?
        
               | BoredomIsFun wrote:
               | > It's really hard to beat qwen coder, for role play
               | 
               | I am not sure if you actually tried that. Mistrals are
               | widely asccepted go-to models for roleplay and creative
               | writing. No Qwens are good at prose, except for their
               | latest big Qwen 3.5.
               | 
               | > I don't think their corpus is lacking in western
               | knowledge,
               | 
               | It absolutely does, especially pop culture knowledge.
        
               | seanmcdirmid wrote:
               | Instruct and coder just follow instructions so well
               | though. I guess I've just never been able to make mistral
               | work well, I guess.
        
               | BoredomIsFun wrote:
               | Qwen3 30B A3B and that big 400+ B Coder were absolutely
               | terrible at editing fiction. I would tell them what to
               | change in the prose and they'd just regurgitate text with
               | no changes.
        
               | CamperBob2 wrote:
               | To be fair there are lots of worse models than OpenAI's
               | GPT-OSS-120b. It's not a standout when positioned next to
               | the latest releases from China, but prior to the current
               | wave it was considered one of the stronger local models
               | you can reasonably run.
        
             | throwaway27448 wrote:
             | They can try. I don't think they'll be able to get the
             | toothpaste back in the tube. The data will just move our of
             | the country.
        
               | seanmcdirmid wrote:
               | Many of the models on hugging face are already Chinese.
               | It's kind of obvious that local AI is going to flourish
               | more in China than the USA due to hardware constraints.
        
             | dotancohen wrote:
             | How do you choose which models to try for which workflows?
             | Do you have objective tests that you run, or do you just
             | get a feel for them while using them in your daily
             | workflow?
        
             | toofy wrote:
             | it's only a matter of time. we have all seen first hand how
             | ... wrong ... these companies behave, almost on a regular
             | basis.
             | 
             | there's a small tinfoil hat part of me that suspects part
             | of their obscene investments and cornering the hardware
             | market is driven by an conscious attempt to stop open
             | source local from taking off. they want it all, the money,
             | the control, and to be the only source of information to
             | us.
        
           | Onavo wrote:
           | Bandwidth is not that expensive. The Big 3 clouds just want
           | to milk customers via egress. Look at Hetzner or CloudFlare
           | R2 if you want to get get an idea of commodity bandwidth
           | costs.
        
       | the__alchemist wrote:
       | Does anyone have a good comparison of HuggingFace/Candle to Burn?
       | I am testing them concurrently, and Burn seems to have an easier-
       | to-use API. (And can use Candle as a backend, which is confusing)
       | When I ask on Reddit or Discord channels, people overwhelmingly
       | recommend Burn, but provide no concrete reasons beyond "Candle is
       | more for inference while Burn is training and inference". This
       | doesn't track, as I've done training on Candle. So, if you've
       | used both: Thoughts?
        
         | csunoser wrote:
         | I have used both (albeit 2 years ago, and things change really
         | fast). At the time, Candle didn't have 2d conv backprop with
         | strides properly implemented. And getting Burn running libtch
         | backend was just a lot simpler.
         | 
         | I did use candle for wasm based inference for teaching purposes
         | - that was reasonably painless and pretty nice.
        
       | dhruv3006 wrote:
       | Huggingface is actually something thats driving good in the
       | world. Good to see this collab/
        
       | androiddrew wrote:
       | One of the few acquisitions I do support
        
       | tkp-415 wrote:
       | Can anyone point me in the direction of getting a model to run
       | locally and efficiently inside something like a Docker container
       | on a system with not so strong computing power (aka a Macbook M1
       | with 8gb of memory)?
       | 
       | Is my only option to invest in a system with more computing
       | power? These local models look great, especially something like
       | https://huggingface.co/AlicanKiraz0/Cybersecurity-BaronLLM_O...
       | for assisting in penetration testing.
       | 
       | I've experimented with a variety of configurations on my local
       | system, but in the end it turns into a make shift heater.
        
         | xrd wrote:
         | I think a better bet is to ask on reddit.
         | 
         | https://www.reddit.com/r/LocalLLM/
         | 
         | Everytime I ask the same thing here, people point me there.
        
         | zozbot234 wrote:
         | The general rule of thumb is that you should feel free to
         | quantize even as low as 2 bits average if this helps you run a
         | model with more active parameters. Quantized models are not
         | perfect at all, but they're preferable to the models with
         | fewer, bigger parameters. With 8GB usable, you could run models
         | with up to 32B active at heavy quantization.
        
           | zargon wrote:
           | A large model (100B+, the more the better) may be acceptable
           | at 2-bit quantization, depending on the task. But not a small
           | model. Especially not for technical tasks. On top of that,
           | one still needs room for OS, software and KV cache. 8GB is
           | just not very useful for local LLMs. That said, it can still
           | be entertaining to try out a 4-bit 8B model for the fun of
           | it.
        
             | zozbot234 wrote:
             | 100B+ is the amount of total parameters, whereas what
             | matters here is active - very different for sparse MoE
             | models. You're right that there's some overhead for the
             | OS/software stack but it's not that much. KV-cache is a
             | good candidate for being swapped out, since it only gets a
             | limited amount of writes per emitted token.
        
               | zargon wrote:
               | Total parameters, not active parameters, is the property
               | that matters for model robustness under extreme
               | quantization.
               | 
               | Once you're swapping from disk, the performance will be
               | quite unusable for most people. And for local inference,
               | KV cache is the worst possible choice to put on disk.
        
         | mft_ wrote:
         | There's no way around needing a powerful-enough system to run
         | the model. So you either choose a model that can fit on what
         | you have --i.e. via a small model, or a quantised slightly
         | larger model-- or you access more powerful hardware, either by
         | buying it or renting it. (IME you don't need Docker. For an
         | easy start just install LM Studio and have a play.)
         | 
         | I picked up a second-hand 64GB M1 Max MacBook Pro a while back
         | for not too much money for such experimentation. It's
         | sufficiently fast at running any LLM models that it can fit in
         | memory, but the gap between those models and Claude is
         | considerable. However, this might be a path for you? It can
         | also run all manner of diffusion models, but there the
         | performance suffers (vs. an older discrete GPU) and you're
         | waiting sometimes many minutes for an edit or an image.
        
           | sigbottle wrote:
           | Are mac kernels optimized compared to CUDA kernels? I know
           | that the unified GPU approach is inherently slower, but I
           | thought a ton of optimizations were at the kernel level too
           | (CUDA itself is a moat)
        
             | bigyabai wrote:
             | Mac kernels are almost always compute shaders written in
             | Metal. That's the bare-minimum of acceleration, being done
             | in a non-portable proprietary graphics API. It's optimized
             | in the loosest sense of the word, but extremely far from
             | "optimal" relative to CUDA (or hell, even Vulkan Compute).
             | 
             | Most people will not choose Metal if they're picking
             | between the two moats. CUDA is far-and-away the better
             | hardware architecture, not to mention better-supported by
             | the community.
        
             | liuliu wrote:
             | Depending on what you do. If you are doing token
             | generations, compute-dense kernel optimization is less
             | interesting (as, it is memory-bounded) than latency
             | optimizations else where (data transfers, kernel
             | invocations etc). And for these, Mac devices actually have
             | a leg than CUDA kernels (as pretty much Metal shaders
             | pipelines are optimized for latencies (a.k.a. games) while
             | CUDA shaders are not (until cudagraph introduction, and of
             | course there are other issues).
        
             | ttoinou wrote:
             | There's this developer called nightmedia who converts a lot
             | of models to apple MLX. I can run Qwen3 coder next at 60
             | tps on my m4 max. It works
        
           | ryandrake wrote:
           | I wasn't able to have very satisfying success until I bit the
           | bullet and threw a GPU at the problem. Found an actually
           | reasonably priced A4000 Ada generation 20GB GPU on eBay and
           | never looked back. I still can't run the insanely large
           | models, but 20GB should hold me over for a while, and I
           | didn't have to upgrade my 10 year old Ivy Bridge vintage
           | homelab.
        
         | HanClinto wrote:
         | Maybe check out Docker Model Runner -- it's built on llama.cpp
         | (in a good way -- not like Ollama) and handles I think most of
         | what you're looking for?
         | 
         | https://www.docker.com/blog/run-llms-locally/
         | 
         | As far as how to find good models to run locally, I found this
         | site recently, and I liked the data it provides:
         | 
         | https://localclaw.io/
        
         | ontouchstart wrote:
         | This is the easiest set up on a Mac. You need at least 16gb on
         | a MacBook:
         | 
         | https://github.com/ggml-org/llama.cpp/discussions/15396
        
         | Hamuko wrote:
         | I tried to run some models on my M1 Max (32 GB) Mac Studio and
         | it was a pretty miserable experience. Slow performance and
         | awful results.
        
         | 0xbadcafebee wrote:
         | 8GB is not enough to do complex reasoning, but you could do
         | very small simple things. Models like Whisper, SmolVLM,
         | Quen2.5-0.5B, Phi-3-mini, Granite-4.0-micro, Mistral-7B,
         | Gemma3, Llama-3.2 all work on very little memory. Tiny models
         | can do a lot if you tune/train them. They also need to be used
         | differently: system prompt preloaded with information, few-shot
         | examples, reasoning guidance, single-task purpose, strict
         | output guidelines. See https://github.com/acon96/home-llm for
         | an example. For each small model, check if Unsloth has a tuned
         | version of it; it reduces your memory footprint and makes
         | inference faster.
         | 
         | For your Mac, you can use Ollama, or MLX (Mac ARM specific,
         | requires different engine and different model disk format, but
         | is faster). Ramalama may help fix bugs or ease the process
         | w/MLX. Use either Docker Desktop or Colima for the VM + Docker.
         | 
         | For today's coding & reasoning models, you need a minimum of
         | 32GB VRAM combined (graphics + system), the more in GPU the
         | better. Copying memory between CPU and GPU is too slow so the
         | model needs to "live" in GPU space. If it can't fit all in GPU
         | space, your CPU has to work hard, and you get a space heater.
         | That Mac M1 will do 5-10 tokens/s with 8GB (and CPU on full
         | blast), or 50 token/s with 32GB RAM (CPU idling). And now you
         | know why there's a RAM shortage.
        
           | BoredomIsFun wrote:
           | > Mistral-7B
           | 
           | Is hopelessly dated. There are much better newer models
           | around.
        
         | yjftsjthsd-h wrote:
         | With only 8 GB of memory, you're going to be running a really
         | small quant, and it's going to be slow and lower quality. But
         | yes, it should be doable. In the worst case, find a tiny gguf
         | and run it on CPU with llamafile.
        
       | option wrote:
       | Isn't HF banned in China? Also, how are many Chinese labs on
       | Twitter all the time?
       | 
       | In either case - huge thanks to them for keeping AI open!
        
         | woadwarrior01 wrote:
         | HF is indeed banned in China. The Chinese equivalent of HF is
         | ModelScope[1].
         | 
         | [1]: https://modelscope.cn/
        
         | disiplus wrote:
         | I think in the West we think everything is blocked. But for
         | example, if you book an eSIM, when you visit you already get
         | direct access to Western services because they route it to some
         | other server. Hong Kong is totally different: they basically
         | use WhatsApp and Google Maps, and everything worked when I was
         | there.
        
           | embedding-shape wrote:
           | But also yes, parent is right, HF is more or less
           | inaccessible, and Modelscope frequently cited as the mirror
           | to use (although many Chinese labs seems to treat HF as the
           | mirror, and Modelscope as the "real" origin).
        
         | dragonwriter wrote:
         | > Isn't HF banned in China?
         | 
         | I think, for some definition of "banned", that's the case. It
         | doesn't stop the Chinese labs from having organization accounts
         | on HF and distributing models there. ModelScope is apparently
         | the HF-equivalent for reaching Chinese users.
        
       | segmondy wrote:
       | Great news! I have always worried about ggml and long term
       | prospect for them and wished for them to be rewarded for their
       | effort.
        
       | stephantul wrote:
       | Georgi is such a legend. Glad to see this happening
        
       | jgrahamc wrote:
       | This is great news. I've been sponsoring ggml/llama.cpp/Georgi
       | since 2023 via Github. Glad to see this outcome. I hope you don't
       | mind Georgi but I'm going to cancel my sponsorship now you and
       | the code have found a home!
        
       | superkuh wrote:
       | I'm glad the llama.cpp and the ggml backing are getting
       | consistent reliable economic support. I'm glad that ggerganov is
       | getting rewarded for making such excellent tools.
       | 
       | I am somewhat anxious about "integration with the Hugging Face
       | transformers library" and possible python ecosystem entanglements
       | that might cause. I know llama.cpp and ggml already have plenty
       | of python tooling but it's not strictly required unless you're
       | quantizing models yourself or other such things.
        
       | periodjet wrote:
       | Prediction: Amazon will end up buying HuggingFace. Screenshot
       | this.
        
       | ukblewis wrote:
       | Honestly I'm shocked to be the only one I see of this opinion:
       | HuggingFace's `accelerate`, `transformers` and `datasets` have
       | been some of the worst open source Python libraries I have ever
       | used that I had to use. They break backwards compatibility
       | constantly, even on APIs which are not underscore/dunder named
       | even on minor version releases without even documenting this,
       | they refuse PRs fixing their lack of `overloads` type annotations
       | which breaks type checking on their libraries and they just
       | generally seem to have spaghetti code. I am not excited that
       | another team is joining them and consolidating more engineering
       | might in the hands of these people
        
         | ukblewis wrote:
         | And I said all of that despite us continuing to use their
         | platform and libraries extensively... We just don't have a
         | choice due to their dominance of open source ML
        
         | ukblewis wrote:
         | And clearly I say all of this in my name and not my employers
         | name
        
       | 0xbadcafebee wrote:
       | > The community will continue to operate fully autonomously and
       | make technical and architectural decisions as usual. Hugging Face
       | is providing the project with long-term sustainable resources,
       | improving the chances of the project to grow and thrive. The
       | project will continue to be 100% open-source and community driven
       | as it is now.
       | 
       | I want this to be true, but business interests win out in the
       | end. Llama.cpp is now the de-facto standard for local inference;
       | more and more projects depend on it. If a company controls it,
       | that means that company controls the local LLM ecosystem. And
       | yeah, Hugging Face seems nice now... so did Google originally. If
       | we all don't want to be locked in, we either need a llama.cpp
       | competitor (with a universal abstration), or it should be
       | controlled by an independent nonprofit.
        
         | zozbot234 wrote:
         | Llama.cpp is an open source project that anyone can fork as
         | needed, so any "control" over it really only extends to
         | facilitating development of certain features.
        
           | 0xbadcafebee wrote:
           | In practice, nobody does this, because you then have to keep
           | the fork up to date with upstream plus your changes, and this
           | is an endless amount of work.
        
       | sheepscreek wrote:
       | Curious about the financials behind this deal. Did they close
       | above what they raised? What's in it for HuggingFace?
        
       | simonw wrote:
       | It's hard to overstate the impact Georgi Gerganov and llama.cpp
       | have had on the local model space. He pretty much kicked off the
       | revolution in March 2023, making LLaMA work on consumer laptops.
       | 
       | Here's that README from March 10th 2023 https://github.com/ggml-
       | org/llama.cpp/blob/775328064e69db1eb...
       | 
       | > The main goal is to run the model using 4-bit quantization on a
       | MacBook. [...] This was hacked in an evening - I have no idea if
       | it works correctly.
       | 
       | Hugging Face have been a great open source steward of
       | Transformers, I'm optimistic the same will be true for GGML.
       | 
       | I wrote a bit about this here:
       | https://simonwillison.net/2026/Feb/20/ggmlai-joins-hugging-f...
        
         | ushakov wrote:
         | i am curious, why are your comments always pinned to the top?
        
           | carbocation wrote:
           | Because many of us think simonw has discerning taste on this
           | topic and like to read what he has to say about it, so we
           | upvote his comments.
        
             | ushakov wrote:
             | i don't doubt this. i just find it questionable that one
             | particular poster always gets in the spotlight when AI is
             | the topic - while other conversations in my opinion offer
             | more interesting angles.
        
               | jonas21 wrote:
               | Upvote the conversations that you find to be more
               | interesting. If enough people do the same, they too will
               | make it to the top.
        
               | coldtea wrote:
               | Parent implies there might be some "boosting" involved,
               | in which case, "upvote the conversations that you find to
               | be more interesting" wont change anything...
               | 
               | Not saying this is the case, but it's what the comment
               | implies, so "just upvote your faves" doesn't really
               | address it.
        
               | colesantiago wrote:
               | Agreed,
               | 
               | I would like to see others, being promoted to the top
               | rather than Simon's constant shilling for backlinks to
               | his blog every time an AI topic is on the front page.
        
           | simonw wrote:
           | At a guess that's because my comment attracted more up-votes
           | than the other top-level comments in the thread.
           | 
           | I generally try to include something in a comment that's not
           | information already under discussion - in this case that was
           | the link and quote from the original README.
        
             | ushakov wrote:
             | of course your comment attracts more upvotes - it's at the
             | top.
        
               | ontouchstart wrote:
               | Attention feeds attention.
               | 
               | Attention is ALL You Need.
        
               | seanhunter wrote:
               | It's at the top because of upvotes. They don't have an
               | "if simonw: boost" branch in the code.
        
               | ushakov wrote:
               | the code is not public, so we can't know. i think it's
               | much more nuanced and certain users' comments might get a
               | preferential treatment, based on factors other than the
               | upvote count - which itself is hidden from us.
        
               | ComplexSystems wrote:
               | > the code is not public, so we can't know.
               | 
               | I feel like you're making this statement in bad faith,
               | rather than honestly believing the developers of the
               | forum software here have built in a clause to pin
               | simonw's comments to the top.
        
               | satvikpendem wrote:
               | > _certain users ' comments might get a preferential
               | treatment_
               | 
               | This does not happen. It hasn't even happened when pg
               | made the forum in the first place.
        
               | dcrazy wrote:
               | I thought dang explicitly said it does happen? It
               | certainly happens for stories.
        
           | llm_nerd wrote:
           | HN goes through phases. I remember when patio11 was the star
           | of the hour on here. At another time it was that security guy
           | (can't remember his name).
           | 
           | And for those who think it's just organic with all of the
           | upvotes, HN absolutely does have a +/- comment bias for
           | users, and it does automatically feature certain people and
           | suppress others.
        
             | imiric wrote:
             | > And for those who think it's just organic with all of the
             | upvotes, HN absolutely does have a bias for authors, and it
             | does automatically feature certain people and suppress
             | others.
             | 
             | Exactly.
             | 
             | There are configurable settings for each account, which
             | might be automatically or manually set--I'm not sure-, that
             | control the initial position of a comment in threads, and
             | how long it stays there. There might be a reward system,
             | where comments from high-karma accounts are prioritized
             | over others, and accounts with "strikes", e.g. direct
             | warnings from moderators, are penalized.
             | 
             | The difference in upvotes that account ultimately receives,
             | and thus the impact on the discussion, is quite stark. The
             | more visible a comment is, i.e. the more at the top it is,
             | the more upvotes it can collect, which in turn makes it
             | stay at the top, and so on.
             | 
             | It's safe to assume that certain accounts, such as those of
             | YC staff, mods, or alumni, or tech celebrities like simonw,
             | are given the highest priority.
             | 
             | I've noticed this on my own account. Before being warned
             | for an IMO bullshit reason, my comments started to appear
             | near the middle, and quickly float down to the bottom,
             | whereas before they would usually be at the top for a few
             | minutes. The quality of what I say hasn't changed, though
             | the account's standing, and certainly the community itself,
             | has.
             | 
             | I don't mind, nor particularly care about an arbitrary
             | number. This is a proprietary platform run by a VC firm. It
             | would be silly to expect that they've cracked the code of
             | online discourse, or that their goal is to keep it
             | balanced. The discussions here are better on average than
             | elsewhere because of the community, although that also has
             | been declining over the years.
             | 
             | I still find it jarring that most people would vote on a
             | comment depending on if they agree with it or not, instead
             | of engaging with it intellectually, which often pushes
             | interesting comments to the bottom. This is an unsolved
             | problem here, as much as it is on other platforms.
        
               | Eisenstein wrote:
               | There is a saying that if everyone you encounter seems to
               | be unreasonable, maybe it isn't the other people that are
               | being unreasonable.
               | 
               | This isn't to say that social media is fair, or that
               | people vote properly or that any ranking system based on
               | agreement by readers is a good one. However, generally
               | when you are getting negativity communicated to you and
               | you are seeing consistently poor results around actions
               | you take, it is going to be useful to examine the
               | possibility that there is a difference in how you
               | perceive what you are doing vs how others do. In that
               | case spending time trying to figure out ways in which you
               | are being wronged so that you can continue in the same
               | manner is going to be time wasted.
        
               | imiric wrote:
               | How are you getting persecution complex from what I said?
               | If anything, your comment might be feeding that delusion.
               | :)
               | 
               | My point is that HN definitely has certain weights
               | associated with accounts, which control the karma,
               | visibility, and ultimately discussion of certain topics.
               | 
               | This problem doesn't affect only negativity or downvotes,
               | but upvotes as well. The most upvoted comments are not
               | necessarily of the highest quality, or contribute the
               | most to the discussion. They just happen to be the most
               | visible, and to generally align with the feeling of the
               | hive mind.
               | 
               | I know this because some of my own comments have been at
               | the top, without being anything special, while others I
               | think are, barely get any attention. I certainly examine
               | my thinking whenever it strongly aligns with the hive
               | mind, as this community does not particularly align with
               | my values.
               | 
               | I also tend to seek out comments near the bottom of
               | threads, and have dead comments enabled, precisely to
               | counteract this flawed system. I often find quality
               | opinions there, so I suggest everyone do the same as
               | well.
               | 
               | An essential feature of a healthy and interesting
               | discussion forum is to accomodate different viewpoints.
               | That starts by not burying those that disagree with the
               | majority, or boosting those that agree. AFAIK no online
               | system has gotten this right yet.
        
             | rymc wrote:
             | the security you mean is probably tptacek
             | (https://news.ycombinator.com/user?id=tptacek)
        
           | throwaway2027 wrote:
           | Time flies and simonw his AI feedback isn't always received
           | favorably, sometimes he pushes it too much.
        
           | francispauli wrote:
           | thanks for reminding me i need to follow his blog weekly
           | again
        
           | satvikpendem wrote:
           | They aren't pinned, people just vote on them, and more so
           | because simonw is a recognizable name with lots of posts and
           | comments.
        
           | magicalhippo wrote:
           | New comments get a boost, and as such are frequently near the
           | top just due to that. Frequent upvotes also boosts. There
           | might be other factors.
           | 
           | However these things are dynamic and change over time. As I
           | read the discussion just now, the GP comment was the ~5th
           | top-level comment.
        
       | mattfrommars wrote:
       | I don't know if this warrants a separate thread here but I have
       | to ask...
       | 
       | How can I realistically get involved the AI development space? I
       | feel left out with what's going on and living in a bubble where
       | AI is forced into by my employer to make use of it (GitHub
       | Copilot), what is a realistic road map to kinda slowly get into
       | AI development, whatever that means
       | 
       | My background is full stack development in Java and React, albeit
       | development is slow.
       | 
       | I've only messed with AI on very application side, created a
       | local chat bot for demo purposes to understand what RAG is about
       | to running models locally. But all of this is very superficial
       | and I feel I'm not in the deep with what AI is about. I get I'm
       | too 'late' to be on the side of building the next frontier model
       | and makes no sense, what else can I do?
       | 
       | I know Python, next step is maybe do 'LLM from scratch"? Or I
       | pick up Google machine learning crash course certificate? Or do
       | recently released Nvidia Certification?
       | 
       | I'm open for suggestions
        
         | fc417fc802 wrote:
         | I'm not entirely clear what your goals are but roughly, just
         | figure out an application that holds your interest and build a
         | model for it from scratch. Probably don't start with an LLM
         | though. Same as for anything else really. If you're interest in
         | computer graphics then decide on a small scale project and go
         | build it from scratch. Etc.
        
         | breisa wrote:
         | Maybe look into model finetuning/distilation. Unsloth [1] has
         | great guides and provides everything you need to get started on
         | Google Colab for free. [1] https://unsloth.ai/
        
         | w10-1 wrote:
         | The competition for root and branch AI models and
         | infrastructure is intense and skilled.
         | 
         | But if you're adjacent to some leaf use-case for AI, you're
         | likely already as good as anyone else at productizing it.
         | 
         | And that's who is getting hired: people who show they can
         | deliver product-market fit.
        
         | swyx wrote:
         | go thru workshops here https://www.youtube.com/@aiDotEngineer/
        
       | fancy_pantser wrote:
       | Was Georgi ever approached by Meta? I wonder what they offered
       | (I'm glad they didn't succeed, just morbid curiosity).
        
       | karmasimida wrote:
       | Does local AI have a future? The models are getting ridiculously
       | big and any storage hardware is hoarded by few companies for next
       | 2 years and nvidia has stopped making consumer GPU for this year.
       | 
       | It seems to me there is no chance local ML is going to be
       | anywhere out of the toy status comparing to closed source ones in
       | short term
        
         | rhdunn wrote:
         | Mistral have small variants (3B, 8B, 14B, etc.), as do others
         | like IBM Granite and Qwen. Then there are finetunes based on
         | these models, depending on your workflow/requirements.
        
           | karmasimida wrote:
           | True, but anything remotely useful is 300B and above
        
             | Eupolemos wrote:
             | That is a very broad and silly position to take, especially
             | in this thread.
             | 
             | I use Devstral 2 and Gemini 3 daily.
        
         | dust42 wrote:
         | I am actually doing now a good part of dev with Qwen3-Coder-
         | Next on an M1 64GB with Qwen Code CLI (a fork of Gemini CLI). I
         | very much like                 a) to have an idea how much
         | tokens I use and        b) be independent of VC financed token
         | machines and        c) I can use it on a plane/train
         | 
         | Also I never have to wait in a queue, nor will I be told to
         | wait for a few hours. And I get many answers in a second.
         | 
         | I don't do full vibe coding with a dozen agents though. I read
         | all the code it produces and guide it where necessary.
         | 
         | Last not least, at some point the VC funded party will be over
         | and when this happens one better knows how to be highly
         | efficient in AI token use.
        
           | ttoinou wrote:
           | How much tokens per seconds are you getting ?
           | 
           | Whats the advantage of qwen code cli over opencode ?
        
             | dust42 wrote:
             | 320 tok/s PP and 42 tok/s TG with 4bit quant and MLX.
             | Llama.cpp was half for this model but afaik has improved a
             | few days ago, I haven't yet tested though.
             | 
             | I have tried many tools locally and was never really happy
             | with any. I tried finally Qwen Code CLI assuming that it
             | would run well with a Qwen model and it does. YMMV, I
             | mostly do javascript and Python. Most important setting was
             | to set the max context size, it then auto compacts before
             | reaching it. I run with 65536 but may raise this a bit.
             | 
             | Last not least OpenCode is VC funded, at some point they
             | will have to make money while Gemini CLI / Qwen CLI are not
             | the primary products of the companies but definitely dog-
             | fooded.
        
       | kristianp wrote:
       | > Towards seamless "single-click" integration with the
       | transformers library
       | 
       | That's interesting. I thought they would be somewhat redundant.
       | They do similar things after all, except training.
        
       | lukebechtel wrote:
       | Thank you Georgi <3
        
       | forty wrote:
       | Looks like someone tried to type "Gmail" while drunk...
        
         | rkomorn wrote:
         | Looks like Gargamel of Smurfs fame to me.
        
       | cyanydeez wrote:
       | Is there a local webui that integrates with Hugging face?
       | 
       | Ollama and webui seem to rapidly lose their charm. Ollama now
       | includes cloud apis which makes no sense as a local.
        
       | moralestapia wrote:
       | I hope Georgi gets a big fat check out of this, he deserves it
       | 100%.
        
       | snowhale wrote:
       | good to see them get proper backing. llama.cpp is basically
       | infrastructure at this point and relying on volunteer maintainers
       | for something this critical was starting to feel sketchy.
        
       | sbinnee wrote:
       | I am happy for ggml team. They did so much work for quantization
       | and actually made it available to everyone. Thank you.
        
       | ontouchstart wrote:
       | I have played with both mlx-lm and llama.cpp after I bought a
       | 24GB M5 MacBook Pro last year.
       | 
       | Then I fell down the rabbit holes of uv, rust and C++ and forgot
       | about LLMs. Today after I saw this announcement and answered
       | someone's question about how to set it up, when I got home, I
       | decided play with llama.cpp again.
       | 
       | I was surprised and impressed:
       | 
       | https://ontouchstart.github.io/rabbit-holes/llama.cpp/
       | 
       | I am not going to use mlx-lm or lmstudio anymore. llama.cpp is so
       | much fun.
        
       | car wrote:
       | So great to see my two favorite Open Source AI projects/companies
       | joining forces.
       | 
       | Since I don't see it mentioned here, _LlamaBarn_ is an awesome
       | little--but mighty--MacOS menubar program, making access to
       | llama.cpp 's great web UI and downloading of tastefully curated
       | models easy as pie. It automatically determines the available
       | model- and context-sizes based on available RAM.
       | 
       | https://github.com/ggml-org/LlamaBarn
       | 
       | Downloaded models live in:                 ~/.llamabarn
       | 
       | Apart from running on localhost, the server address and port can
       | be set via CLI:                 # bind to all interfaces
       | (0.0.0.0)       defaults write app.llamabarn.LlamaBarn
       | exposeToNetwork -bool YES            # or bind to a specific IP
       | (e.g., for Tailscale)       defaults write
       | app.llamabarn.LlamaBarn exposeToNetwork -string "100.x.x.x"
       | # disable (default)       defaults delete app.llamabarn.LlamaBarn
       | exposeToNetwork
        
         | noisy_boy wrote:
         | Github is showing me unicorn - is there an Linux equivalent? I
         | have a old Thinkpad with a puny Nvidia GPU, can I hope to find
         | anything useful to run on that?
        
           | car wrote:
           | Building Llama.cpp from source with CUDA enabled should get
           | you pretty far. llama-server has a really good web UI, the
           | latest version supports model switching.
           | 
           | As for models, plenty of GGUF quantized (down to 2-bit)
           | available on HF and modelscope.
        
       | am17an wrote:
       | One often overlooked after that is ggml, the tensor library that
       | runs llama.cpp is not based on pytorch, rather just plain cpp. In
       | a world where pytorch dominates, it shows that alternatives are
       | possible and are worthy to be pursued.
        
       | mhher wrote:
       | It's great to see the ggml team getting proper backing. Keeping
       | inference in bare-metal C/C++ without the Python bloat is the
       | only way local AI is going to scale efficiently. Well deserved
       | for Georgi, Johannes, Piotr, and the rest of the team.
        
       | jpcompartir wrote:
       | This is great, brings clear benefits to both sides and the rest
       | of us.
       | 
       | Always rooting for Hugging Face
        
       ___________________________________________________________________
       (page generated 2026-02-21 23:02 UTC)