[HN Gopher] Creating a LLM-as-a-Judge That Drives Business Results
       ___________________________________________________________________
        
       Creating a LLM-as-a-Judge That Drives Business Results
        
       Author : thenameless7741
       Score  : 54 points
       Date   : 2024-10-30 14:25 UTC (8 hours ago)
        
 (HTM) web link (hamel.dev)
 (TXT) w3m dump (hamel.dev)
        
       | jerpint wrote:
       | The biggest problem these days is that it's very easy to hack
       | together a solution for a problem that, at first glance, seems to
       | work just fine. Understanding the limits of the system is the
       | hard part, especially since LLMs can't know when they don't know
        
         | trod123 wrote:
         | I second this, though its a bit unclear to any non-domain
         | expert in systems or systems organization.
         | 
         | Defining the problem and identifying constraints is always the
         | hardest part and its always different for each application. You
         | also never know what you don't know when a project starts.
         | 
         | The process is inevitably a constant feedback system of
         | discovery, testing, and discarding irrational or irrelevent
         | results until you get to first principles or requirements
         | needed.
         | 
         | Computers as a general rule can't do this as the lowest part of
         | von-neumann architecture can't tell truth from falsity when the
         | inputs are the same (i.e. determinism as a property is broken).
         | You have automation break in similar ways.
         | 
         | Approximations which are what the encoded weights are, are just
         | that, approximations, without thought process. While you can
         | make a very convincing simulacra, you'll never get a true
         | expert, and the process is not a net benefit overall since you
         | end up creating cascading problems later that cannot be solved.
         | 
         | Put another way, when there is no economic incentive to become
         | a true expert in the first place, and this is only done through
         | working the problems, the knowledge is not passed on, and then
         | lost when the experts age and die.
         | 
         | Since you at best may only be able to target what amounts to
         | entry-level position roles, and these roles are what people use
         | to become experts, any adoption replacing these workers
         | guarantees this ultimately destructive outcome with any
         | haphazard attempt. Even if you can't even meet that level of
         | production initially, the mere claim is sufficient to cause
         | damage to society as a whole. It more often then not
         | fundamentally breaks the social contract in uncontrollable
         | ways.
         | 
         | The article takes the approach of leveraging domain experts,
         | most likely copying them in various ways, but if we're being
         | real, that is doomed to failure too for a number of reasons
         | that are much to long to go into here.
         | 
         | Needless to say, true domain experts, and really any rational
         | person won't knowingly volunteer anything related to their
         | given profession that will be used to economically destroy
         | their future prospects. When they find out after-the-fact, they
         | stop contributing or volunteering completely, as seen on
         | reddit. These people are also more likely to sabotage these
         | systems in subtle ways.
         | 
         | This dynamic may also cause the exact opposite, where the truly
         | gifted leave the profession entirely and you get extreme brain
         | drain, like depicted in Atlas Shrugged.
         | 
         | People can and do go on strike, withdrawing the only thing of
         | value they have that cannot be taken. We are already seeing the
         | beginning of this type of fallout in the Tech sector. August
         | unemployment for Tech was 7%?, national unemployment was 1.5%,
         | that's 4.6x the national average, and this is at peak seasonal
         | hiring (with Nov-Mar often being hiring freezes). Tech
         | historically has not been impacted by interest rate increases,
         | its been bulletproof related to rate increases so the
         | underlying cause is not interest rates (as some claim). The
         | only recent change big enough to cause a splash publicly is AI,
         | which is a pandora's box.
         | 
         | When employers cannot differentiate the gifted from the non-
         | gifted, there is no work for the intelligent, and these people
         | always have more options than others. They'll leave their
         | chosen profession if they can't find work, and will be unlikely
         | to return to it even if things turn around later.
         | 
         | Intelligent people always ask the question about should they be
         | doing something, whereas evil (destructive/blind) people focus
         | on can they do something.
         | 
         | The main difference is a focus on controlling the consequences
         | of their actions so they don't destroy their children's future.
        
       | bzmrgonz wrote:
       | This is a brilliant write up, very thick but very detailed, thank
       | you for taking the time(assuming you didn't employ AI.. LOL). So
       | listen, assuming you are the author, there is an open source case
       | management software called arkcase. I engaged them as a possible
       | flagship platform at a lawfirm. Going thru their presentation, I
       | noticed that the platorm is extremely customizable and flexible.
       | So much so, that I think that in itself is the reason people
       | don't adopt it in droves. Essentially too permissive. However, I
       | think it would be a great backend component to a "rechat" style
       | LLM front end. Is there such a need? To have a backend data
       | repository that interacts with a front-end LLM that employees
       | interact with in pure prose and directives? What does the current
       | backend look like for services such as rechat and other chat
       | based LLM agents? I bring this up, because arkcase is so flexible
       | that i can work in broad industries and needs, from managing a
       | highschool athletic department(dosier and bio on each staff and
       | players) to the entire US OFFICE OF PERSONNEL(ALFRESCO AND
       | ARKCASE for security clearance investigation). The idea would be
       | that by introducing an agent LLM as front end, the learning curve
       | could be flatten and the extrem flexibility can be abstracted.
        
       | petesergeant wrote:
       | I'm going through almost exactly this process at the moment, and
       | this article is excellent. Aligns with my experience while adding
       | a bunch of good ideas I hadn't thought of / discovered yet. A+,
       | would read again.
        
       | Lerc wrote:
       | There are a few broad areas of risk in AI.
       | 
       | 1. Enabling goes both ways, therefore bad actors can also be
       | enabled by AI.
       | 
       | 2. Accuracy of suggestions. Information provided by AI may be
       | incorrect, be it code, how to brush one's teeth, or height of
       | Arnold Schwarzenegger. At worst AI can respond against the users
       | interests if the creator of the AI has configured it to do so.
       | 
       | 3. Accuracy of Determinations. LLM-as-a-Judge falls under this
       | criteria. This is one of the areas where a single error can
       | magnify the most.
       | 
       | This post says: _What about guardrails?_
       | 
       |  _Guardrails are a separate but related topic. They are a way to
       | prevent the LLM from saying /doing something harmful or
       | inappropriate. This blog post focuses on helping you create a
       | judge that's aligned with business goals, especially when
       | starting out._
       | 
       | That seems woefully inadequate.
       | 
       | When using AI to make determinations there has to be guardrails.
       | Having looked at drafts of legislation and position statements of
       | governments, many are looking at legally requiring that any
       | implementers of AI systems that make determinations _must_
       | implement processes to deal with the situation where the AI makes
       | an incorrect determination. To be effective this should be a
       | process that can be initiated by individuals affected by this
       | determination.
        
         | nine_zeros wrote:
         | > Having looked at drafts of legislation and position
         | statements of governments, many are looking at legally
         | requiring that any implementers of AI systems that make
         | determinations must implement processes to deal with the
         | situation where the AI makes an incorrect determination
         | 
         | The real legislation we need is liability. Who is liable to
         | suffering caused by LLMs inaccuracies?
         | 
         | I think if liability should be on corporations selling the LLM
         | as a solution.
         | 
         | If a person gets arrested for selling the police a fuzzy LLM
         | solution, and this causes unnecessary grief to the individual,
         | the seller of the LLM service must compensate the individual
         | with 4x the median income of their metropolitan area, for the
         | duration of the harm caused.
        
       ___________________________________________________________________
       (page generated 2024-10-30 23:01 UTC)