[HN Gopher] Show HN: Autolabel, a Python library to label and en...
___________________________________________________________________
Show HN: Autolabel, a Python library to label and enrich text data
with LLMs
Hi HN! I'm excited to share Autolabel, an open-source Python
library to label and enrich text datasets with any Large Language
Model (LLM) of your choice. We built Autolabel because access to
clean, labeled data is a huge bottleneck for most ML/data science
teams. The most capable LLMs are able to label data with high
accuracy, and at a fraction of the cost and time compared to manual
labeling. With Autolabel, you can leverage LLMs to label any text
dataset with <5 lines of code. We're eager for your feedback!
Author : nihit-desai
Score : 76 points
Date : 2023-06-20 19:26 UTC (3 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| viswajithiii wrote:
| Thank you for open sourcing this! This seems very useful,
| especially because of the confidence estimation, which lets you
| use LLMs for the points they can do well and fall back to manual
| labelling for the rest.
| voz_ wrote:
| You just posted this here
| https://news.ycombinator.com/item?id=36384015
|
| It's one thing to show HN / share, its another thing to spam it
| with your ads.
| nihit-desai wrote:
| Hi!
|
| The earlier post was a report summarizing LLM labeling
| benchmarking results. This post shares the open source library.
|
| Neither is intended to be an ad. Our hope with sharing these is
| to demonstrate how LLMs can be used for data labeling, and get
| feedback from the community
| [deleted]
| isawczuk wrote:
| You should read carefully OpenAI terms and conditions before
| using it to build custom datasets.
| victorbjorklund wrote:
| Which part?
| iillexial wrote:
| For anyone wondering it's here
| https://openai.com/policies/terms-of-use:
|
| >use output from the Services to develop models that compete
| with OpenAI;
|
| Well, I still can use ChatGPT labeling for many other
| purposes anyway.
| binarymax wrote:
| There's some room for interpretation here. Are small
| sentiment analysis models competing with a large general
| purpose generative model? OpenAI doesn't provide the
| former.
|
| I see competing models as those of LLaMa, Falcon, etc.
| which would fall into the terms in my interpretation.
| moffkalast wrote:
| The part that says you shouldn't take outputs from their
| models to build datasets for training competitor models.
|
| Outputs from models that they trained on stolen ebooks,
| unpaid reddit data, data scraped from millions of websites
| without credit, etc. Sort of like stealing a bike and then
| getting mad that it got stolen again later, because it was
| clearly rightfully yours.
|
| https://i.pinimg.com/originals/d7/72/22/d77222df469b50e3b4cd.
| ..
| chillbill wrote:
| I get your point but your analogy doesn't quite work.
| fiknbddsehu wrote:
| [flagged]
___________________________________________________________________
(page generated 2023-06-20 23:00 UTC)