[HN Gopher] Konwinski Prize
       ___________________________________________________________________
        
       Konwinski Prize
        
       Author : tosh
       Score  : 109 points
       Date   : 2024-12-14 12:10 UTC (2 days ago)
        
 (HTM) web link (andykonwinski.com)
 (TXT) w3m dump (andykonwinski.com)
        
       | cs702 wrote:
       | Fabulous. A big round of applause for Andy Kowinski and the SWE-
       | bench folks!
        
       | thih9 wrote:
       | The Kaggle competition page has more details:
       | https://www.kaggle.com/competitions/konwinski-prize
       | 
       | The prizes scale with the model's score; the total prize pool is
       | between $100,000 and $1,225,000, depending on the top scores.
        
         | optimalsolver wrote:
         | >$1M for the AI that can close 90% of new GitHub issues
         | 
         | If your AI can do this, it's worth several orders of magnitude
         | more. Just FYI.
        
           | segmondy wrote:
           | Exactly, I'll personally buy it for $2Million for anyone that
           | can get it and assign me the full code/weight and rights.
        
             | noch wrote:
             | > I'll personally buy it for $2Million for anyone that can
             | get it and assign me the full code/weight and rights.
             | 
             | If you are serious, you should put the funds in an escrow
             | contract and announce a bounty.
             | 
             | There are many brilliant people who would work on this for
             | you.
        
             | andyk wrote:
             | I hope the competition will inspire people to make
             | breakthroughs in the open, so I won't take any rights to
             | the IP, instead the winning solutions must use open source
             | code and open weight models.
        
           | senko wrote:
           | And Linux kernel, curl, SQLite and many other open source
           | software are worth infinitely more than the purchase price.
           | 
           | Also, you cut off the "from the benchmark" part; this doesn't
           | expect it to solve any random Github issue, just the ones
           | from the (presumably manually vetted and cleaned up) bench
           | dataset.
        
             | minimaxir wrote:
             | Linux kernel, curl, and SQLite don't require significant
             | compute cost to develop that put it out of reach of
             | hobbyists, and only in the reach of organizations expecting
             | a positive ROI.
        
               | senko wrote:
               | The cost of Linux kernel development alone has been
               | estimated at a few $B (https://dwheeler.com/essays/linux-
               | kernel-cost.html), current figure is probably over tens
               | of billions.
               | 
               | Also, the prize doesn't require you to train a new
               | foundational model, just that whatever you use is open
               | weights or open source.
               | 
               | Theoretically, might be get away with a Llama3.3 (or any
               | other model which you think makes sense) with a cleverly
               | designed agentic system and a fresh codebase-
               | understanding approach, with minimal compute cost.
               | 
               | (ok, probably not that easy, but just saying there's much
               | more to AI coding that the underlying model)
        
               | bruce511 wrote:
               | >> The cost of Linux kernel development alone has been
               | estimated at a few $B (https://dwheeler.com/essays/linux-
               | kernel-cost.html), current figure is probably over tens
               | of billions.
               | 
               | I followed your link, but it doesn't seem to bear out
               | upur assertion. The two numbers mentioned in the article
               | are 176 mil and 612 mil. Mind you those weren't an
               | estimate of cost, but rather an estimate to replace.
               | Article is dated 2004, with an update in 2011.
               | 
               | Using the lines-of-code estimation it crossed a billion
               | in 2010 - again _to replace_. That has no relation to
               | what it did actually cost.
               | 
               | Getting from there to "tens of billions" seems a stretch.
               | Assuming a bottom value in your estimate of 20 billion,
               | and assuming a developer costs a million a year, that's
               | 20 000 man-years of effort. Which implies something like
               | 2000 people (very well paid people) working continuously
               | for the last decade.
               | 
               | Which seems, well, unlikely.
        
               | olddustytrail wrote:
               | There are around 5000 active kernel devs, they are
               | generally highly skilled and therefore highly paid, and
               | they've been working for a lot longer than 10 years.
               | 
               | So doesn't seem that unlikely based on your estimates.
        
               | stevage wrote:
               | Highly paid like a million a year? Is that a thing?
        
               | senko wrote:
               | Linux kernel has been in development since the nineties,
               | not just for the last ten years. Also 5000 contributors
               | is a lot more than 2000 from gp's comment.
               | 
               | Let's ignore the years before dotcom boom since the dev
               | community was probably much smaller, and assume an
               | average of 3500 contributors since.
               | 
               | That's 25 years * 3500 contributors on average * 200k
               | salary (total employee cost, not take home) = $17.5b
               | 
               | Napkin math, but order of magnitude checks out.
        
               | senko wrote:
               | > The two numbers mentioned in the article are 176 mil
               | and 612 mil.
               | 
               | Those two numbers are from the intro. The postscript and
               | the updates at the end mention $1.4b and $3b
               | respectively.
               | 
               | The real cost is probably impossible to calculate, but
               | that order of magnitude is a reasonable estimate IMHO,
               | and absolutely comparable, or even larger, than compute
               | costs for SOTA LLMs
        
           | andyk wrote:
           | (reposting from locallama and lower down here) yep that's
           | true.
           | 
           | one of my goals is to inspire and honor those that work on
           | open source AI. Those people tend to be motivated by things
           | like impact and the excitement of being part of something
           | big. i know that's how i always feel when i'm around Berkeley
           | and get to meet or work with OG BSD hackers or the people who
           | helped invent core internet protocols.
           | 
           | those people are doing this kind of OSS work and sharing it
           | with the world anyway, without any cash prize. i think of
           | this as a sort of thank you gift for them. and also a way to
           | maybe convince a few people to explore that path who might
           | not have otherwise.
        
           | frgtpsswrdlame wrote:
           | If you're the only one that can come close. Kaggle
           | competition prizes are about focusing smart people on the
           | same problem. But it's very rare for one team to blow all the
           | others out of the water. So if you wanted to make a business
           | out of the problem kaggle will (probably) show the best you
           | could do and still have no moat.
        
       | kenjackson wrote:
       | Surprised to see Amazon Q Developer already at 55% on the
       | verified suite.
       | 
       | But what I appreciate even more is that we keep pushing the bar
       | for what an AI can/should be able to do. Excited to track this
       | benchmark over time.
        
       | Upvoter33 wrote:
       | Very cool to see "outcome oriented" prizes like this -- it's
       | another way to fund research, perhaps. Will be curious to track
       | who does this and whether success in the prize correlates with
       | deep innovation ...
        
       | xianshou wrote:
       | SWE-bench with a private final eval, so you can't hack the test
       | set!
       | 
       | In a perfect world this wouldn't be necessary, but in the current
       | research environment where benchmarks are the primary currency
       | and are usually taken at face value, more unbiased evals with
       | known methodology but hidden tests are exactly what we need.
       | 
       | Also one reason why, for instance, I trust small but well-curated
       | benchmarks such as Aider (https://aider.chat/docs/leaderboards/)
       | or Wolfram (https://www.wolfram.com/llm-benchmarking-
       | project/index.php.e...) over large, widely targeted, and
       | increasingly saturated or gamed benchmarks such as LMSYS Arena or
       | HumanEval.
       | 
       | Goodhart's law is thriving and it's our duty to fight it.
        
       | theogravity wrote:
       | What would be an example of cheating since it says "no cheating".
        
         | NitpickLawyer wrote:
         | The only reasonable way to cheat on this would be to find real
         | bugs in many repos, train your models on the solutions, wait
         | till the cut-off period, report those bugs, propose PRs and
         | hope your bugs get selected. Pretty small chances, tbh and
         | probably not worth the rewards (the 90% solve rate is pretty
         | much impossible given the constraints - 4x L4s and ~4-6min /
         | problem. There's no way any models that can be ran on those
         | machines under those time limits are that much better than the
         | SotA frontier models)
        
         | zamadatix wrote:
         | The "when they can't cheat" comment relates to the "Why make a
         | contamination free version?" section.
        
         | andyk wrote:
         | That has a double meaning - half tongue in cheek.
         | 
         | 1) since we are creating a contamination-free version of SWE-
         | bench (i.e. scraping a new test set after submissions are
         | frozen) it is guaranteed that agents in this contest can't
         | "cheat", i.e., models can't have trained on the benchmark /
         | agents cant memorize answers.
         | 
         | 2) as a general rule in life, don't cheat on things (not that
         | there aren't exceptions)
        
       | minimaxir wrote:
       | The author posted about the original tweet announcement a couple
       | days ago: https://news.ycombinator.com/item?id=42413392
       | 
       | In reponse to my comment of "Realistically, an AI that can
       | perform that well is worth a lot, lot more than $1M.", he said:
       | 
       | > yeah i agree. one of my goals is to inspire and honor those
       | that work on open source AI.
       | 
       | > people who work on open source tend to be motivated by things
       | like impact and the excitement of being part of something bigger
       | than themselves - at least that's how i always feel when i'm
       | around Berkeley and get to meet or work with OG BSD hackers and
       | people who helped invent core internet protocols or the guys who
       | invented RISC or more recently RISC-V
       | 
       | > those people are going to do this kind of OSS work and share it
       | with the world anyway, without any cash prize. i think of this as
       | a sort of thank you gift for them. and also a way to maybe
       | convince a few people to explore that path who might not have
       | otherwise.
        
       | Mistletoe wrote:
       | Idk how accurate net worth predictors on the web are but it says
       | his net worth is $20 million. Is this from his personal funds?
        
         | SamvitJ wrote:
         | He is a Databricks co-founder:
         | https://www.forbes.com/sites/kenrickcai/2021/05/26/accidenta...
         | 
         | Likely worth at least $500M, even if his stake was smaller than
         | some of the other co-founders.
        
         | andyk wrote:
         | yes the prize money is from me to the winners
        
       | vouaobrasil wrote:
       | This is why AI advances so quickly. There are easy economic
       | mechanisms to encourage it, while AI safety laws have to go
       | through an arduous process. Seems rather lopsided when the
       | technology can be potentially dangerous. We should have
       | mechanisms to take a step back and examine this stuff with more
       | caution, mechanisms which have equal force to economic force but
       | we don't. The Amish have a much better model.
        
         | MichaelZuo wrote:
         | Who gets to define 'potentially dangerous'?
         | 
         | Isn't that the core issue in the first place?
        
           | vouaobrasil wrote:
           | We should have a discussion and debate about it. The point
           | is, IF people decide that it IS dangerous, then there is no
           | mechanism to stop it.
        
             | MichaelZuo wrote:
             | Huh? There are well known mechanisms after that point. Such
             | as a resolution of the UN Security Council.
        
               | vouaobrasil wrote:
               | Really? Let's say the bottom 70% of earners in the U.S.
               | decided that AI was dangerous and its development should
               | be stopped. Do you think the top 30% would allow that?
        
               | MichaelZuo wrote:
               | How does this relate to the Security Council or any other
               | known mechanisms that operate in the world?
        
       | neonate wrote:
       | https://twitter.com/andykonwinski/status/1867015050403385674
       | 
       | https://kprize.ai/
        
       | stevage wrote:
       | Man, imagine having a million bucks to just give away to
       | something you think is cool.
        
       | andyk wrote:
       | andy here - happy to answer questions.
       | 
       | Also, I answered a bunch of questions yesterday on LocalLLaMA
       | that people here might find interesting
       | https://www.reddit.com/r/LocalLLaMA/comments/1hdfng5/ill_giv...
        
       ___________________________________________________________________
       (page generated 2024-12-16 23:00 UTC)