[HN Gopher] Hard problems that reduce to document ranking
___________________________________________________________________
Hard problems that reduce to document ranking
Author : noperator
Score : 143 points
Date : 2025-02-25 17:37 UTC (5 hours ago)
(HTM) web link (noperator.dev)
(TXT) w3m dump (noperator.dev)
| noperator wrote:
| A concept that I've been thinking about a lot lately:
| transforming complex problems into document ranking problems to
| make them easier to solve. LLMs can assist greatly here, as I
| demonstrated at inaugural DistrictCon this past weekend.
| lifeisstillgood wrote:
| So would this be 1600 commits and one of which fixes the bug
| (which might be easier with commit messages?) or is this a diff
| between two revisions, with 1600 chunks, each chunk a
| "document" ?
|
| I am trying to grok why we want to find the fix - is it to
| understand what was done so we can exploit unpatched instances
| in the wild?
|
| Also also
|
| "identifying candidate functions for fuzzing targets" - if
| every function is a document I get where the list of documents
| is, what what is the query - how do I say "find me a function
| most suitable to fuzzing"
|
| Apologies if that's brusque - trying to fit new concepts in my
| brain :-)
| noperator wrote:
| Great questions. For commits or revision diffs as documents--
| either will work. Yes, I've applied this to N-day
| vulnerability identification to support exploit development
| and offensive security testing. And yes, for fuzzing, a
| sensible approach would be to dump the exported function
| attributes (names, source/disassembled code, other relevant
| context, etc.) from a built shared library, and ask, "Which
| of these functions most likely parses complex input and may
| be a good candidate for fuzzing?" I've had some success with
| that specific approach already.
| obblekk wrote:
| The open source ranking library is really interesting. It's using
| a type of merge sort where the comparator function is an llm
| comparing (but doing batches >2 for fewer calls).
|
| Reducing problems to document ranking is effectively a type of
| test-time search - also very interesting!
|
| I wonder if this approach could be combined with GRPO to create
| more efficient chain of thought search...
|
| https://github.com/BishopFox/raink?tab=readme-ov-file#descri...
| rahimnathwani wrote:
| The article introducing the library has something about how
| pairwise comparisons are most reliable (i.e. for each pair of
| items you ask an LLM which they prefer) but computationally
| expensive. Doing a single LLM call (rank these items in order)
| is much less reliable. So they do something in between that
| gives enough pairwise comparisons to have a more reliable list.
|
| https://news.ycombinator.com/item?id=43175658
| westurner wrote:
| Ranking (information retrieval)
| https://en.wikipedia.org/wiki/Ranking_(information_retrieval...
|
| awesome-generative-information-retrieval > Re-ranking:
| https://github.com/gabriben/awesome-generative-information-r...
| rfurmani wrote:
| Very cool! This is also one of my beliefs in building tools for
| research, that if you can solve the problem of predicting and
| ranking the top references for a given idea, then you've learned
| to understand a lot about problem solving and decomposing
| problems into their ingredients. I've been pleasantly surprised
| by how well LLMs can rank relevance, compared to supervised
| training of a relevancy score. I'll read the linked paper
| (shameless plug, here it is on my research tools site:
| https://sugaku.net/oa/W4401043313/)
| Everdred2dx wrote:
| Very interesting application of LLMs. Thanks for sharing!
| m3kw9 wrote:
| That title hurts my head to read
| mskar wrote:
| Great article, I've had similar findings! LLM based "document-
| chunk" ranking is a core feature of PaperQA2
| (https://github.com/Future-House/paper-qa) and part of why it
| works so well for scientific Q&A compared to traditional
| embedding-ranking based RAG systems.
| noperator wrote:
| That's awesome. Will take a closer look!
| hexator wrote:
| This furthers an idea I've had recently that we (and the media)
| are focusing too much on creating value by making more ever more
| complex LLMs, and instead we are vastly underestimating creative
| applications of current generation AI.
| noperator wrote:
| Agree. I think LLMs are usually not "harnessed" correctly for
| complex, multi-step problems--hence the `raink` CLI tool:
| https://github.com/noperator/raink
| moralestapia wrote:
| Minor nitpick,
|
| Should be "document ranking reduces to these hard problems",
|
| I never knew why the convention was like that, it seems backwards
| to me as well, but that's how it is.
| dwringer wrote:
| "Document ranking reduces to these hard problems" would imply
| that document ranking is itself an instance of a certain group
| of hard problems. That's not what the article is saying.
| adamkhakhar wrote:
| I'm curious - why is LLM ranking preferred over cosine similarity
| from an embedding model (in the context of this specific
| problem)?
| panarky wrote:
| Because the question "does Diff A fix Vuln B" is not answered
| by the cosine distance between vector(Diff A) and vector(Vuln
| B).
| antirez wrote:
| One interesting thing about LLMs, that is also related to why
| chain of thoughts work so well, is that they are good at sampling
| (saying a lot of things about a problem), and are good, when
| shown N solutions, to point at the potentially better one. They
| do these things better than zero-shot "tell me how to do that".
| So CoT is searching inside the space of representation + ranking,
| basically. So this idea is leveraging something LLMs are able to
| clearly do pretty well.
___________________________________________________________________
(page generated 2025-02-25 23:00 UTC)