hngopher.com

       [HN Gopher] Show HN: Autolabel, a Python library to label and en...
       ___________________________________________________________________
        
       Show HN: Autolabel, a Python library to label and enrich text data
       with LLMs
        
       Hi HN! I'm excited to share Autolabel, an open-source Python
       library to label and enrich text datasets with any Large Language
       Model (LLM) of your choice.  We built Autolabel because access to
       clean, labeled data is a huge bottleneck for most ML/data science
       teams. The most capable LLMs are able to label data with high
       accuracy, and at a fraction of the cost and time compared to manual
       labeling. With Autolabel, you can leverage LLMs to label any text
       dataset with <5 lines of code.  We're eager for your feedback!
        
       Author : nihit-desai
       Score  : 76 points
       Date   : 2023-06-20 19:26 UTC (3 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | viswajithiii wrote:
       | Thank you for open sourcing this! This seems very useful,
       | especially because of the confidence estimation, which lets you
       | use LLMs for the points they can do well and fall back to manual
       | labelling for the rest.
        
       | voz_ wrote:
       | You just posted this here
       | https://news.ycombinator.com/item?id=36384015
       | 
       | It's one thing to show HN / share, its another thing to spam it
       | with your ads.
        
         | nihit-desai wrote:
         | Hi!
         | 
         | The earlier post was a report summarizing LLM labeling
         | benchmarking results. This post shares the open source library.
         | 
         | Neither is intended to be an ad. Our hope with sharing these is
         | to demonstrate how LLMs can be used for data labeling, and get
         | feedback from the community
        
       | [deleted]
        
       | isawczuk wrote:
       | You should read carefully OpenAI terms and conditions before
       | using it to build custom datasets.
        
         | victorbjorklund wrote:
         | Which part?
        
           | iillexial wrote:
           | For anyone wondering it's here
           | https://openai.com/policies/terms-of-use:
           | 
           | >use output from the Services to develop models that compete
           | with OpenAI;
           | 
           | Well, I still can use ChatGPT labeling for many other
           | purposes anyway.
        
             | binarymax wrote:
             | There's some room for interpretation here. Are small
             | sentiment analysis models competing with a large general
             | purpose generative model? OpenAI doesn't provide the
             | former.
             | 
             | I see competing models as those of LLaMa, Falcon, etc.
             | which would fall into the terms in my interpretation.
        
           | moffkalast wrote:
           | The part that says you shouldn't take outputs from their
           | models to build datasets for training competitor models.
           | 
           | Outputs from models that they trained on stolen ebooks,
           | unpaid reddit data, data scraped from millions of websites
           | without credit, etc. Sort of like stealing a bike and then
           | getting mad that it got stolen again later, because it was
           | clearly rightfully yours.
           | 
           | https://i.pinimg.com/originals/d7/72/22/d77222df469b50e3b4cd.
           | ..
        
             | chillbill wrote:
             | I get your point but your analogy doesn't quite work.
        
         | fiknbddsehu wrote:
         | [flagged]
        
       ___________________________________________________________________
       (page generated 2023-06-20 23:00 UTC)