[HN Gopher] Creating a LLM-as-a-Judge That Drives Business Results
___________________________________________________________________
Creating a LLM-as-a-Judge That Drives Business Results
Author : thenameless7741
Score : 54 points
Date : 2024-10-30 14:25 UTC (8 hours ago)
(HTM) web link (hamel.dev)
(TXT) w3m dump (hamel.dev)
| jerpint wrote:
| The biggest problem these days is that it's very easy to hack
| together a solution for a problem that, at first glance, seems to
| work just fine. Understanding the limits of the system is the
| hard part, especially since LLMs can't know when they don't know
| trod123 wrote:
| I second this, though its a bit unclear to any non-domain
| expert in systems or systems organization.
|
| Defining the problem and identifying constraints is always the
| hardest part and its always different for each application. You
| also never know what you don't know when a project starts.
|
| The process is inevitably a constant feedback system of
| discovery, testing, and discarding irrational or irrelevent
| results until you get to first principles or requirements
| needed.
|
| Computers as a general rule can't do this as the lowest part of
| von-neumann architecture can't tell truth from falsity when the
| inputs are the same (i.e. determinism as a property is broken).
| You have automation break in similar ways.
|
| Approximations which are what the encoded weights are, are just
| that, approximations, without thought process. While you can
| make a very convincing simulacra, you'll never get a true
| expert, and the process is not a net benefit overall since you
| end up creating cascading problems later that cannot be solved.
|
| Put another way, when there is no economic incentive to become
| a true expert in the first place, and this is only done through
| working the problems, the knowledge is not passed on, and then
| lost when the experts age and die.
|
| Since you at best may only be able to target what amounts to
| entry-level position roles, and these roles are what people use
| to become experts, any adoption replacing these workers
| guarantees this ultimately destructive outcome with any
| haphazard attempt. Even if you can't even meet that level of
| production initially, the mere claim is sufficient to cause
| damage to society as a whole. It more often then not
| fundamentally breaks the social contract in uncontrollable
| ways.
|
| The article takes the approach of leveraging domain experts,
| most likely copying them in various ways, but if we're being
| real, that is doomed to failure too for a number of reasons
| that are much to long to go into here.
|
| Needless to say, true domain experts, and really any rational
| person won't knowingly volunteer anything related to their
| given profession that will be used to economically destroy
| their future prospects. When they find out after-the-fact, they
| stop contributing or volunteering completely, as seen on
| reddit. These people are also more likely to sabotage these
| systems in subtle ways.
|
| This dynamic may also cause the exact opposite, where the truly
| gifted leave the profession entirely and you get extreme brain
| drain, like depicted in Atlas Shrugged.
|
| People can and do go on strike, withdrawing the only thing of
| value they have that cannot be taken. We are already seeing the
| beginning of this type of fallout in the Tech sector. August
| unemployment for Tech was 7%?, national unemployment was 1.5%,
| that's 4.6x the national average, and this is at peak seasonal
| hiring (with Nov-Mar often being hiring freezes). Tech
| historically has not been impacted by interest rate increases,
| its been bulletproof related to rate increases so the
| underlying cause is not interest rates (as some claim). The
| only recent change big enough to cause a splash publicly is AI,
| which is a pandora's box.
|
| When employers cannot differentiate the gifted from the non-
| gifted, there is no work for the intelligent, and these people
| always have more options than others. They'll leave their
| chosen profession if they can't find work, and will be unlikely
| to return to it even if things turn around later.
|
| Intelligent people always ask the question about should they be
| doing something, whereas evil (destructive/blind) people focus
| on can they do something.
|
| The main difference is a focus on controlling the consequences
| of their actions so they don't destroy their children's future.
| bzmrgonz wrote:
| This is a brilliant write up, very thick but very detailed, thank
| you for taking the time(assuming you didn't employ AI.. LOL). So
| listen, assuming you are the author, there is an open source case
| management software called arkcase. I engaged them as a possible
| flagship platform at a lawfirm. Going thru their presentation, I
| noticed that the platorm is extremely customizable and flexible.
| So much so, that I think that in itself is the reason people
| don't adopt it in droves. Essentially too permissive. However, I
| think it would be a great backend component to a "rechat" style
| LLM front end. Is there such a need? To have a backend data
| repository that interacts with a front-end LLM that employees
| interact with in pure prose and directives? What does the current
| backend look like for services such as rechat and other chat
| based LLM agents? I bring this up, because arkcase is so flexible
| that i can work in broad industries and needs, from managing a
| highschool athletic department(dosier and bio on each staff and
| players) to the entire US OFFICE OF PERSONNEL(ALFRESCO AND
| ARKCASE for security clearance investigation). The idea would be
| that by introducing an agent LLM as front end, the learning curve
| could be flatten and the extrem flexibility can be abstracted.
| petesergeant wrote:
| I'm going through almost exactly this process at the moment, and
| this article is excellent. Aligns with my experience while adding
| a bunch of good ideas I hadn't thought of / discovered yet. A+,
| would read again.
| Lerc wrote:
| There are a few broad areas of risk in AI.
|
| 1. Enabling goes both ways, therefore bad actors can also be
| enabled by AI.
|
| 2. Accuracy of suggestions. Information provided by AI may be
| incorrect, be it code, how to brush one's teeth, or height of
| Arnold Schwarzenegger. At worst AI can respond against the users
| interests if the creator of the AI has configured it to do so.
|
| 3. Accuracy of Determinations. LLM-as-a-Judge falls under this
| criteria. This is one of the areas where a single error can
| magnify the most.
|
| This post says: _What about guardrails?_
|
| _Guardrails are a separate but related topic. They are a way to
| prevent the LLM from saying /doing something harmful or
| inappropriate. This blog post focuses on helping you create a
| judge that's aligned with business goals, especially when
| starting out._
|
| That seems woefully inadequate.
|
| When using AI to make determinations there has to be guardrails.
| Having looked at drafts of legislation and position statements of
| governments, many are looking at legally requiring that any
| implementers of AI systems that make determinations _must_
| implement processes to deal with the situation where the AI makes
| an incorrect determination. To be effective this should be a
| process that can be initiated by individuals affected by this
| determination.
| nine_zeros wrote:
| > Having looked at drafts of legislation and position
| statements of governments, many are looking at legally
| requiring that any implementers of AI systems that make
| determinations must implement processes to deal with the
| situation where the AI makes an incorrect determination
|
| The real legislation we need is liability. Who is liable to
| suffering caused by LLMs inaccuracies?
|
| I think if liability should be on corporations selling the LLM
| as a solution.
|
| If a person gets arrested for selling the police a fuzzy LLM
| solution, and this causes unnecessary grief to the individual,
| the seller of the LLM service must compensate the individual
| with 4x the median income of their metropolitan area, for the
| duration of the harm caused.
___________________________________________________________________
(page generated 2024-10-30 23:01 UTC)