[HN Gopher] WaveCoder: Enhanced instruction tuning with refined ...
___________________________________________________________________
WaveCoder: Enhanced instruction tuning with refined data generation
Author : tosh
Score : 20 points
Date : 2024-01-17 13:35 UTC (9 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| ilaksh wrote:
| Did anyone find the source code yet?
| bugglebeetle wrote:
| They said on Twitter that they're still conferring with
| Microsoft internally on the extent and nature of the open-
| source release:
|
| https://nitter.net/TeamCodeLLM_AI/status/1747652471714144702
| SubiculumCode wrote:
| is synthetic data a really big deal right now and LLM? if so, are
| there any take-home ideas that might apply to other areas, say
| analysis of MRI?
| thatguysaguy wrote:
| I think the critical thing is you need some ground truth way of
| evaluating the synthetic data. You can generate 100 programs
| with your LLM and filter to the 1-2 that solve the problem, but
| there's not an equivalent option for things like MRI.
| cwmoore wrote:
| A self-debiasing estimator might become unreliable, and
| brains think that matters?
| bugglebeetle wrote:
| Synthetic data is a big deal, essentially as a form of
| "knowledge distillation" from large models or for transforming
| high-quality text into training data (e.g. Q&A pairs). Almost
| everyone is using GPT-4 for this. Dunno about other domains, as
| it's based on the mutability of text, relative to whatever
| ground truths are embedded therein. This seems less feasible
| for other kinds of inputs, but who knows.
| ipsum2 wrote:
| Yes and no. In terms of LLMs, it's basically figuring out how
| to exfiltrate information from GPT4 to remove costs of data
| gathering. The limitations of that are that the model will
| never be better than gpt4, and when gpt4 produces incorrect
| information, the model trained on synthetic data will also do
| so.
|
| In other fields like computer vision, synthetic data is useful
| for generating ground truth data, like for depth masks.
___________________________________________________________________
(page generated 2024-01-17 23:00 UTC)