[HN Gopher] Exploring Microsoft's Phi-3-Mini and its integration...
___________________________________________________________________
Exploring Microsoft's Phi-3-Mini and its integration with tool like
Ollama
Author : Nik0912
Score : 44 points
Date : 2024-12-26 14:09 UTC (2 days ago)
(HTM) web link (pieces.app)
(TXT) w3m dump (pieces.app)
| maccam912 wrote:
| Is there any rule of thumb for small language models vs large
| language models? I've seen phi 4 called a small language model
| but with 14 billion parameters, it's larger than some large
| language models.
| ekianjo wrote:
| 7b to 9b is usually what we call small. the rule of thumb is a
| model that you can run on a single GPU.
| exitb wrote:
| It's not a useful distinction. The first LLMs had less than 1
| billion parameters anyway.
| ron0c wrote:
| This is the AI I am excited for. Data and execution local to my
| machine. I think Intel is betting on this with the copilot
| included processors. I hope ollama or other local AI services
| will be able to utilize these co-processors soon.
| ekianjo wrote:
| The NPUs on laptops don't have access to enough memory to run
| very large models.
| talldayo wrote:
| Oftentimes they do. If they don't, it's not very hard to page
| memory to and from the NPU until the operation is completed.
|
| The bigger problem is that this NPU hardware isn't built
| around scaling to larger models. It's laser-focused on dense
| computation and low-precision inference, which usually isn't
| much more efficient than running the same matmul as a compute
| shader. For Whisper-scale models that don't require insanely
| high precision or super sparse decoding, NPU hardware can
| work great. For LLMs it is almost always going to be slower
| than a well-tuned GPU.
| 650REDHAIR wrote:
| Right, but for most people do they need access to a huge
| model locally?
| ben_w wrote:
| Most people shouldn't host locally at all.
|
| Of those who do, I can see students and researchers
| benefiting from small models. Students in particular are
| famously short on money for fancy hardware.
|
| My experience trying one of the Phi models (I think 3,
| might have been 2) was brief, because it failed so hard: my
| first test was to ask for a single page web app Tetris
| clone, and not only was the first half the output simply
| doing that task wrong, the second half was a sudden sharp
| turn into python code to train an ML model -- it didn't
| even delimit the transition, one line javascript, the next
| python.
| diggan wrote:
| > My experience trying one of the Phi models (I think 3,
| might have been 2) was brief
|
| The Phi models are tiny LMs, maybe SLM is more fitting
| label than LLM (Large -> Small). As such, you cannot
| throw even semi-complicated problems at them. Things like
| "autocomplete" and other simpler things are the use cases
| you'd use it for, not "code this game for me", you'll
| need something much more powerful for that.
| ben_w wrote:
| > Things like "autocomplete" and other simpler things are
| the use cases you'd use it for, not "code this game for
| me", you'll need something much more powerful for that.
|
| Indeed, clearly.
|
| However, it was tuned for chat, and people kept telling
| me it was competitive with the OpenAI models for coding.
| miohtama wrote:
| Maybe a better solution is privately hosted cloud solution, or
| just any SaaS that cannot violate data privacy by design.
| sofixa wrote:
| > any SaaS that cannot violate data privacy by design
|
| And that is hosted in a jurisdiction that forces them to take
| it seriously, e.g. Mistral in France that has to comply with
| GDPR and any AI and privacy regulations out of the EU.
| msoad wrote:
| in my opinion there is room for small and fast and large and slow
| but much smarter models. Use cases like phone keyboard
| autocomplete and next few words suggestion in coding or writing
| need very fast models that should by definition small. Very large
| models that are much smarter are also useful, for instance
| debugging issues or proofreading long letters.
|
| Cursor really aced this. The Cursor model is very fast to suggest
| useful inline completions and then leaves big problems to big
| models.
| mycall wrote:
| Could chaining models together via tool calls based on
| benchmarking that would redirect to the best model allow for
| smaller models to perform as well as big[er] models for memory
| constrained/local environments?
| isoprophlex wrote:
| Yes, indeed, see for example https://arxiv.org/abs/2310.03094
| akudha wrote:
| Apologies for the dumb question - can these models be used at my
| work, i.e, for commercial purposes? What is the legality of it?
| smallerize wrote:
| In the USA, code generated by a computer cannot be copyrighted.
| So you can use it for commercial purposes, but you can't
| control it the way you could with code that you wrote yourself.
| And that's legally fine, but your company's legal department
| might not like that idea.
| akudha wrote:
| But this model can be used for more than generating code, no?
| lodovic wrote:
| That's not entirely accurate. In the US, computer-generated
| code can be copyrighted. The key point is that copyright
| protection extends to the original expression in the code,
| but not to its functional aspects, such as algorithms, system
| design, or logic.
| minimaxir wrote:
| Phi-3-mini has a MIT license, which is commercial friendly:
| https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
| nicce wrote:
| Do we know for sure that model is not trained with
| copyrighted material or with GPL-lisenced code? That is the
| biggest issue right now.
| minimaxir wrote:
| That is the case with every LLM (except a couple research
| experiments) and won't be resolved until the courts do.
|
| Literally every tech company that uses LLMs would be in
| legal trouble if that becomes the precedent.
| nicce wrote:
| Yes. It is a bigger problem than the correct lisence of
| the model, and I feel that original commenter is not
| aware of that.
|
| Many companies are waiting for court decisions and are
| not using even GitHub Copilot. There is even growing
| business for making analysis for binaries and source code
| whether they use GPL code or not.
___________________________________________________________________
(page generated 2024-12-28 23:00 UTC)