[HN Gopher] Q* Hypothesis: Enhancing Reasoning, Rewards, and Syn...
___________________________________________________________________
Q* Hypothesis: Enhancing Reasoning, Rewards, and Synthetic Data
Author : Jimmc414
Score : 83 points
Date : 2023-11-24 19:02 UTC (3 hours ago)
(HTM) web link (www.interconnects.ai)
(TXT) w3m dump (www.interconnects.ai)
| romesc wrote:
| Sure A* is awesome, but taking the "star" and immediately
| attributing it to A* is probably a bridge too far.
|
| Q* or any X* for that matter is extremely common for referring to
| the optimal function under certain assumptions. (usually cost /
| reward structure).
| tunesmith wrote:
| Yeah I just saw the video from that researcher (later an OpenAI
| researcher?) that talked about it back in 2016... not that I
| understood much, but it definitely seemed that Q* was a
| generalization of the Q algorithm described on the previous
| slide. The optimum something across all somethings.
| maaaaattttt wrote:
| If you have the possibility I would be quite interested in a
| link to the video or alternatively the name of the researcher
| you mention.
| resource0x wrote:
| LeCun: Please ignore the deluge of complete nonsense about
| Q*. https://twitter.com/ylecun/status/1728126868342145481
| Zolde wrote:
| It will be nice to see the breakthroughs resulting from what
| people _believed_ Q* to have been.
| erikaww wrote:
| certainly more things to throw at the wall! Excited to see the
| "accidental" progress
| bschne wrote:
| I love this take. Reminds me of how the Mechanical Turk
| apparently indirectly inspired someone to build a weaving
| machine b/c "how hard could it be if machines can play chess"
| -- https://x.com/gordonbrander/status/1385245747071787008?s=20
| spicyusername wrote:
| I have trouble believing this isn't just a sneaky marketing
| campaign.
| dmix wrote:
| Nothing OpenAI has released product-wise (ChatGPT, Dall-E) has
| required 'marketing'. The value speaks for itself. People
| raving about it on twitter, telling their friends/coworkers,
| and journos documenting their explorations is more than enough.
|
| If this was an extremely competitive market that'd be more
| plausible. But they enjoy some pretty serious dominance and are
| struggling to handle the growth they already have with GPT.
|
| If Q* is real, you likely wouldn't _need_ to hype up something
| that has the potential to solve math / logic problems without
| having seen the problem/solution before hand. Something that
| novel would be hugely valuable and generate demand naturally.
| djvdq wrote:
| Of course they are doing PR stunts to kepp media talking
| about them.
|
| Remember Altman saying that they shouldn't release GPT-2
| because of it being too dangerous? It's the same thing with
| this Q* thing.
| FeepingCreature wrote:
| Because it could be used to generate spam, yes, and he was
| right about that.
|
| And to set a precedent that models should be released
| cautiously, and he was right about that too, and it is to
| our detriment that we don't take that more seriously.
| dmix wrote:
| Helen Toner board member accused Sam/OpenAI for releasing
| GPT too early, there were people who wanted to keep it
| locked away for those concerns, which largely haven't come
| true (a lot of people don't understand how spam detection
| works and overrate the impact of deepfakes).
|
| Company's have competing interests and personalities.
| That's normal. But there is no indication that GPT was held
| back for marketing.
| lawlessone wrote:
| >The value speaks for itself.
|
| What is that though? I've seen a lot of tools created for it.
| Custom AI Characters. Things that let you have an LLM read a
| DB etc. But I haven't much in regards to customer facing
| things.
| dharmab wrote:
| It's pretty good for customer support agent tools. Feed the
| LLM your company's knowledgebase and give it the context of
| the support chat/email/call transcript, and it suggests
| solutions to the agent.
| dist-epoch wrote:
| > Satya: Microsoft has over a million paying Github Copilot
| users
|
| https://www.zdnet.com/article/microsoft-has-over-a-
| million-p...
| janalsncm wrote:
| > But I haven't much in regards to customer facing things.
|
| How about ChatGPT? It's a game changer. It has allowed me
| to learn Rust extremely quickly since I can just ask it
| direct questions about my code. And I don't worry about
| hallucinations since the compiler is always there to "fact
| check".
|
| I'm pretty bearish on OpenAI wrappers. Low effort, zero
| moat. But that's largely irrelevant to the value of OpenAI
| products themselves.
| ghostzilla wrote:
| > People raving about it on twitter
|
| For the most part usages of GenAI have been sharing output on
| social media. It is mind-blowingly fascinating, but the
| utility of it is far far behind.
| bhhaskin wrote:
| I agree. Only thing that matters is results.
| YetAnotherNick wrote:
| I have trouble believing the who ousting of Sam Altman was
| planned for this. But yeah someone might be smart enough to
| feed wrong info to the press after the whole saga was over.
| ben_w wrote:
| I definitely need to blog more. A* search with a neural network
| as the heuristic function seemed like a good idea to
| investigate... a month or two ago, and I never got around to it.
| haltist wrote:
| I have an idea for a great AI project and it's about finding the
| first logical inconsistency in an argument about a formal system
| like an LLM. I think if OpenAI can deliver that then I will
| believe they have achieved AGI.
|
| I am a techno-optimist and I believe this is possible and all I
| need is a lot of money. I think $80B would be more than
| sufficient. I will be awaiting a reply from other techno-
| optimists like Marc Andreesen and those who are techno-optimist
| adjacent like millionaires and billionaires that read HN
| comments.
| adamnemecek wrote:
| Both RL and A* are both approaches to dynamic programming, this
| would not be surprising.
| jbrisson wrote:
| Imho, in order to reach AGI you have to get out of the LLM space.
| It has to be something else. Something close to biological
| plausability.
| bob1029 wrote:
| I think big parts of the answer include time domain, multi-
| agent and iterative concepts.
|
| Language is about communication of information _between_
| parties. One instance of an LLM doing one-shot inference is not
| leveraging much of this. Only first-order semantics can really
| be explored. There is a limit to what can be communicated in a
| context of _any_ size if you only get one shot at it. Change
| over time is a critical part of our reality.
|
| Imagine if your agent could determine that it has been thinking
| about something for too long and adapt strategy automatically.
| Increase to higher param model, adapt the context, etc.
|
| Perhaps we aren't seeking total AGI/ASI either (aka inventing
| new physics). From a business standpoint, it seems like we
| mostly have what we need now. The next ~3 months are going to
| be a hurricane in our shop.
| hackinthebochs wrote:
| LLMs as we currently understand them won't reach AGI. But AGI
| will very likely have an LLM as a component. What is language
| but a way to represent arbitrary structure? Of course that's
| relevant to AGI.
| valine wrote:
| Covering an airplane in feathers isn't going to make it fly
| faster. Biological plausibility is a red haring imho.
| foooorsyth wrote:
| The training space is more important. I don't think a general
| intelligence will spawn from text corpuses. A person only
| able to consume text to learn would be considered severely
| disabled.
|
| A significant part of intelligence comes from existence in
| meatspace and the ability to manipulate and observe that
| meatspace. A two year old learns much faster with much less
| data than any LLM.
| valine wrote:
| We already have multimodal models that take both images and
| text as input. The bulk of the training for these models
| was in text, not images. This shouldn't be surprising. Text
| is a great way of abstractly and efficiently representing
| reality. Of course those patterns are useful for making
| sense of other modalities.
|
| Beyond modeling the world, text is also a great way to
| model human thought and reason. People like to explain
| their thought process in writing. LLMs already pick up on
| and mimic chain of thought well.
|
| Contained within large datasets is crystallized thought,
| and efficient descriptions of reality that have proven
| useful for processing modalities beyond text. To me that
| seems like a great foundation for AGI.
| orbital-decay wrote:
| Definitions, again. OpenAI defines AGI as highly autonomous
| agents that can replace humans in most of the economically
| important jobs. Those don't need to look or function like
| humans.
| kelseyfrog wrote:
| A* is a red-herring based on availability bias.
|
| Q* is already a thing and it's the Bellman equation describing
| the optimal action-value function.
| bertil wrote:
| Are you saying that the Bellman equations already use the
| notation Q*, or are you saying that those equations (I'm not as
| familiar as I should be, sorry) are the obvious connection
| between the incoherent ramblings from Reuters?
|
| Because having similar acronyms or notations used for multiple
| contexts that end up collapsing with cross-pollination of ideas
| is far too frequent these days. I once made a dictionary of
| terms used in A/B testing / Feature Flags / DevOps / Statistics
| / Econometrics, and _most_ keywords had multiple, incompatible
| acceptions depending on the exact context, all somewhat
| relevant to A /B testing. Every reader came out of it so
| defeated, like language itself was broken...
| ElectricalUnion wrote:
| Can you link this dictionary here or is it proprietary?
| tnecniv wrote:
| Q* is an incredibly common notation for the above version of
| the Bellman equation. I think it's stupid to call an
| algorithm Q* for the same reason it is to read too much into
| this: it's an incredibly nondescript name.
| kelseyfrog wrote:
| I'm saying that everyone already uses that notation including
| OpenAI[1].
|
| 1.
| https://spinningup.openai.com/en/latest/algorithms/ddpg.html
| janalsncm wrote:
| Is it possible they were referring to this research they
| published in May?
|
| https://openai.com/research/improving-mathematical-reasoning...
| fizx wrote:
| The most likely hypothesis I've seen for Q*:
|
| https://twitter.com/alexgraveley/status/1727777592088867059
| urbandw311er wrote:
| See also https://news.ycombinator.com/item?id=38407741
___________________________________________________________________
(page generated 2023-11-24 23:00 UTC)