[HN Gopher] Hard problems that reduce to document ranking
       ___________________________________________________________________
        
       Hard problems that reduce to document ranking
        
       Author : noperator
       Score  : 143 points
       Date   : 2025-02-25 17:37 UTC (5 hours ago)
        
 (HTM) web link (noperator.dev)
 (TXT) w3m dump (noperator.dev)
        
       | noperator wrote:
       | A concept that I've been thinking about a lot lately:
       | transforming complex problems into document ranking problems to
       | make them easier to solve. LLMs can assist greatly here, as I
       | demonstrated at inaugural DistrictCon this past weekend.
        
         | lifeisstillgood wrote:
         | So would this be 1600 commits and one of which fixes the bug
         | (which might be easier with commit messages?) or is this a diff
         | between two revisions, with 1600 chunks, each chunk a
         | "document" ?
         | 
         | I am trying to grok why we want to find the fix - is it to
         | understand what was done so we can exploit unpatched instances
         | in the wild?
         | 
         | Also also
         | 
         | "identifying candidate functions for fuzzing targets" - if
         | every function is a document I get where the list of documents
         | is, what what is the query - how do I say "find me a function
         | most suitable to fuzzing"
         | 
         | Apologies if that's brusque - trying to fit new concepts in my
         | brain :-)
        
           | noperator wrote:
           | Great questions. For commits or revision diffs as documents--
           | either will work. Yes, I've applied this to N-day
           | vulnerability identification to support exploit development
           | and offensive security testing. And yes, for fuzzing, a
           | sensible approach would be to dump the exported function
           | attributes (names, source/disassembled code, other relevant
           | context, etc.) from a built shared library, and ask, "Which
           | of these functions most likely parses complex input and may
           | be a good candidate for fuzzing?" I've had some success with
           | that specific approach already.
        
       | obblekk wrote:
       | The open source ranking library is really interesting. It's using
       | a type of merge sort where the comparator function is an llm
       | comparing (but doing batches >2 for fewer calls).
       | 
       | Reducing problems to document ranking is effectively a type of
       | test-time search - also very interesting!
       | 
       | I wonder if this approach could be combined with GRPO to create
       | more efficient chain of thought search...
       | 
       | https://github.com/BishopFox/raink?tab=readme-ov-file#descri...
        
         | rahimnathwani wrote:
         | The article introducing the library has something about how
         | pairwise comparisons are most reliable (i.e. for each pair of
         | items you ask an LLM which they prefer) but computationally
         | expensive. Doing a single LLM call (rank these items in order)
         | is much less reliable. So they do something in between that
         | gives enough pairwise comparisons to have a more reliable list.
         | 
         | https://news.ycombinator.com/item?id=43175658
        
       | westurner wrote:
       | Ranking (information retrieval)
       | https://en.wikipedia.org/wiki/Ranking_(information_retrieval...
       | 
       | awesome-generative-information-retrieval > Re-ranking:
       | https://github.com/gabriben/awesome-generative-information-r...
        
       | rfurmani wrote:
       | Very cool! This is also one of my beliefs in building tools for
       | research, that if you can solve the problem of predicting and
       | ranking the top references for a given idea, then you've learned
       | to understand a lot about problem solving and decomposing
       | problems into their ingredients. I've been pleasantly surprised
       | by how well LLMs can rank relevance, compared to supervised
       | training of a relevancy score. I'll read the linked paper
       | (shameless plug, here it is on my research tools site:
       | https://sugaku.net/oa/W4401043313/)
        
       | Everdred2dx wrote:
       | Very interesting application of LLMs. Thanks for sharing!
        
       | m3kw9 wrote:
       | That title hurts my head to read
        
       | mskar wrote:
       | Great article, I've had similar findings! LLM based "document-
       | chunk" ranking is a core feature of PaperQA2
       | (https://github.com/Future-House/paper-qa) and part of why it
       | works so well for scientific Q&A compared to traditional
       | embedding-ranking based RAG systems.
        
         | noperator wrote:
         | That's awesome. Will take a closer look!
        
       | hexator wrote:
       | This furthers an idea I've had recently that we (and the media)
       | are focusing too much on creating value by making more ever more
       | complex LLMs, and instead we are vastly underestimating creative
       | applications of current generation AI.
        
         | noperator wrote:
         | Agree. I think LLMs are usually not "harnessed" correctly for
         | complex, multi-step problems--hence the `raink` CLI tool:
         | https://github.com/noperator/raink
        
       | moralestapia wrote:
       | Minor nitpick,
       | 
       | Should be "document ranking reduces to these hard problems",
       | 
       | I never knew why the convention was like that, it seems backwards
       | to me as well, but that's how it is.
        
         | dwringer wrote:
         | "Document ranking reduces to these hard problems" would imply
         | that document ranking is itself an instance of a certain group
         | of hard problems. That's not what the article is saying.
        
       | adamkhakhar wrote:
       | I'm curious - why is LLM ranking preferred over cosine similarity
       | from an embedding model (in the context of this specific
       | problem)?
        
         | panarky wrote:
         | Because the question "does Diff A fix Vuln B" is not answered
         | by the cosine distance between vector(Diff A) and vector(Vuln
         | B).
        
       | antirez wrote:
       | One interesting thing about LLMs, that is also related to why
       | chain of thoughts work so well, is that they are good at sampling
       | (saying a lot of things about a problem), and are good, when
       | shown N solutions, to point at the potentially better one. They
       | do these things better than zero-shot "tell me how to do that".
       | So CoT is searching inside the space of representation + ranking,
       | basically. So this idea is leveraging something LLMs are able to
       | clearly do pretty well.
        
       ___________________________________________________________________
       (page generated 2025-02-25 23:00 UTC)