[HN Gopher] Open-Source Data Collection Platform for LLM Fine-Tu...
       ___________________________________________________________________
        
       Open-Source Data Collection Platform for LLM Fine-Tuning and RLHF
        
       Author : dvilasuero
       Score  : 75 points
       Date   : 2023-06-05 17:37 UTC (5 hours ago)
        
 (HTM) web link (argilla.io)
 (TXT) w3m dump (argilla.io)
        
       | anakin87 wrote:
       | In my experience, Argilla is a good open source platform for
       | datacentric NLP. And these features are a great addition... Have
       | you tried it?
        
         | dvilasuero wrote:
         | Thanks Anakin! we want to bring the data-centric approach to
         | how LLMs are built and fine-tuned too.
        
       | sathergate wrote:
       | how does this compare to scale or surge's offerings?
        
         | dvilasuero wrote:
         | Thanks! The main difference is that Argilla is built as an
         | open-source component to be integrated into the wider
         | MLOps/LLMOps stack. The focus being on continous data
         | collection, monitoring, and fine-tuning with open-source and
         | commercial LLMs, as opposed to outsourcing training data
         | collection, and one-off labeling projects. In the blog post we
         | mention this with other words:
         | 
         | Domain Expertise vs Outsourcing. In Argilla, the process of
         | data labeling and curation is not a single event but an
         | iterative component of the ML lifecycle, setting it apart from
         | traditional data labeling platforms. Argilla integrates into
         | the MLOps stack, using feedback loops for continuous data and
         | model refinement. Given the current complexity of LLM feedback,
         | organizations are increasingly leveraging their own internal
         | knowledge and expertise instead of outsourcing training sets to
         | data labeling services. Argilla supports this shift
         | effectively.
         | 
         | I'd love to hear your thoughts on this!
        
           | sathergate wrote:
           | OSS approach makes sense!
        
       | xrd wrote:
       | Looks like no quantized options with llama.cpp?
       | 
       | https://github.com/ggerganov/llama.cpp/issues/1602
        
         | dvilasuero wrote:
         | We're very much looking forward to seeing Falcon-40B support on
         | llama.cpp. For production use cases, this is also highly
         | relevant: https://huggingface.co/blog/sagemaker-huggingface-llm
        
       | dvilasuero wrote:
       | I'm Dani, CEO and co-founder of Argilla.
       | 
       | Happy to answer any questions you might have and excited to hear
       | your thoughts!
       | 
       | More about Argilla
       | 
       | GitHub: https://github.com/argilla-io/argilla Docs:
       | https://docs.argilla.io
        
       ___________________________________________________________________
       (page generated 2023-06-05 23:01 UTC)