[HN Gopher] Exploring Microsoft's Phi-3-Mini and its integration...
       ___________________________________________________________________
        
       Exploring Microsoft's Phi-3-Mini and its integration with tool like
       Ollama
        
       Author : Nik0912
       Score  : 44 points
       Date   : 2024-12-26 14:09 UTC (2 days ago)
        
 (HTM) web link (pieces.app)
 (TXT) w3m dump (pieces.app)
        
       | maccam912 wrote:
       | Is there any rule of thumb for small language models vs large
       | language models? I've seen phi 4 called a small language model
       | but with 14 billion parameters, it's larger than some large
       | language models.
        
         | ekianjo wrote:
         | 7b to 9b is usually what we call small. the rule of thumb is a
         | model that you can run on a single GPU.
        
         | exitb wrote:
         | It's not a useful distinction. The first LLMs had less than 1
         | billion parameters anyway.
        
       | ron0c wrote:
       | This is the AI I am excited for. Data and execution local to my
       | machine. I think Intel is betting on this with the copilot
       | included processors. I hope ollama or other local AI services
       | will be able to utilize these co-processors soon.
        
         | ekianjo wrote:
         | The NPUs on laptops don't have access to enough memory to run
         | very large models.
        
           | talldayo wrote:
           | Oftentimes they do. If they don't, it's not very hard to page
           | memory to and from the NPU until the operation is completed.
           | 
           | The bigger problem is that this NPU hardware isn't built
           | around scaling to larger models. It's laser-focused on dense
           | computation and low-precision inference, which usually isn't
           | much more efficient than running the same matmul as a compute
           | shader. For Whisper-scale models that don't require insanely
           | high precision or super sparse decoding, NPU hardware can
           | work great. For LLMs it is almost always going to be slower
           | than a well-tuned GPU.
        
           | 650REDHAIR wrote:
           | Right, but for most people do they need access to a huge
           | model locally?
        
             | ben_w wrote:
             | Most people shouldn't host locally at all.
             | 
             | Of those who do, I can see students and researchers
             | benefiting from small models. Students in particular are
             | famously short on money for fancy hardware.
             | 
             | My experience trying one of the Phi models (I think 3,
             | might have been 2) was brief, because it failed so hard: my
             | first test was to ask for a single page web app Tetris
             | clone, and not only was the first half the output simply
             | doing that task wrong, the second half was a sudden sharp
             | turn into python code to train an ML model -- it didn't
             | even delimit the transition, one line javascript, the next
             | python.
        
               | diggan wrote:
               | > My experience trying one of the Phi models (I think 3,
               | might have been 2) was brief
               | 
               | The Phi models are tiny LMs, maybe SLM is more fitting
               | label than LLM (Large -> Small). As such, you cannot
               | throw even semi-complicated problems at them. Things like
               | "autocomplete" and other simpler things are the use cases
               | you'd use it for, not "code this game for me", you'll
               | need something much more powerful for that.
        
               | ben_w wrote:
               | > Things like "autocomplete" and other simpler things are
               | the use cases you'd use it for, not "code this game for
               | me", you'll need something much more powerful for that.
               | 
               | Indeed, clearly.
               | 
               | However, it was tuned for chat, and people kept telling
               | me it was competitive with the OpenAI models for coding.
        
         | miohtama wrote:
         | Maybe a better solution is privately hosted cloud solution, or
         | just any SaaS that cannot violate data privacy by design.
        
           | sofixa wrote:
           | > any SaaS that cannot violate data privacy by design
           | 
           | And that is hosted in a jurisdiction that forces them to take
           | it seriously, e.g. Mistral in France that has to comply with
           | GDPR and any AI and privacy regulations out of the EU.
        
       | msoad wrote:
       | in my opinion there is room for small and fast and large and slow
       | but much smarter models. Use cases like phone keyboard
       | autocomplete and next few words suggestion in coding or writing
       | need very fast models that should by definition small. Very large
       | models that are much smarter are also useful, for instance
       | debugging issues or proofreading long letters.
       | 
       | Cursor really aced this. The Cursor model is very fast to suggest
       | useful inline completions and then leaves big problems to big
       | models.
        
         | mycall wrote:
         | Could chaining models together via tool calls based on
         | benchmarking that would redirect to the best model allow for
         | smaller models to perform as well as big[er] models for memory
         | constrained/local environments?
        
           | isoprophlex wrote:
           | Yes, indeed, see for example https://arxiv.org/abs/2310.03094
        
       | akudha wrote:
       | Apologies for the dumb question - can these models be used at my
       | work, i.e, for commercial purposes? What is the legality of it?
        
         | smallerize wrote:
         | In the USA, code generated by a computer cannot be copyrighted.
         | So you can use it for commercial purposes, but you can't
         | control it the way you could with code that you wrote yourself.
         | And that's legally fine, but your company's legal department
         | might not like that idea.
        
           | akudha wrote:
           | But this model can be used for more than generating code, no?
        
           | lodovic wrote:
           | That's not entirely accurate. In the US, computer-generated
           | code can be copyrighted. The key point is that copyright
           | protection extends to the original expression in the code,
           | but not to its functional aspects, such as algorithms, system
           | design, or logic.
        
         | minimaxir wrote:
         | Phi-3-mini has a MIT license, which is commercial friendly:
         | https://huggingface.co/microsoft/Phi-3-mini-128k-instruct
        
           | nicce wrote:
           | Do we know for sure that model is not trained with
           | copyrighted material or with GPL-lisenced code? That is the
           | biggest issue right now.
        
             | minimaxir wrote:
             | That is the case with every LLM (except a couple research
             | experiments) and won't be resolved until the courts do.
             | 
             | Literally every tech company that uses LLMs would be in
             | legal trouble if that becomes the precedent.
        
               | nicce wrote:
               | Yes. It is a bigger problem than the correct lisence of
               | the model, and I feel that original commenter is not
               | aware of that.
               | 
               | Many companies are waiting for court decisions and are
               | not using even GitHub Copilot. There is even growing
               | business for making analysis for binaries and source code
               | whether they use GPL code or not.
        
       ___________________________________________________________________
       (page generated 2024-12-28 23:00 UTC)