[HN Gopher] Konwinski Prize
___________________________________________________________________
Konwinski Prize
Author : tosh
Score : 109 points
Date : 2024-12-14 12:10 UTC (2 days ago)
(HTM) web link (andykonwinski.com)
(TXT) w3m dump (andykonwinski.com)
| cs702 wrote:
| Fabulous. A big round of applause for Andy Kowinski and the SWE-
| bench folks!
| thih9 wrote:
| The Kaggle competition page has more details:
| https://www.kaggle.com/competitions/konwinski-prize
|
| The prizes scale with the model's score; the total prize pool is
| between $100,000 and $1,225,000, depending on the top scores.
| optimalsolver wrote:
| >$1M for the AI that can close 90% of new GitHub issues
|
| If your AI can do this, it's worth several orders of magnitude
| more. Just FYI.
| segmondy wrote:
| Exactly, I'll personally buy it for $2Million for anyone that
| can get it and assign me the full code/weight and rights.
| noch wrote:
| > I'll personally buy it for $2Million for anyone that can
| get it and assign me the full code/weight and rights.
|
| If you are serious, you should put the funds in an escrow
| contract and announce a bounty.
|
| There are many brilliant people who would work on this for
| you.
| andyk wrote:
| I hope the competition will inspire people to make
| breakthroughs in the open, so I won't take any rights to
| the IP, instead the winning solutions must use open source
| code and open weight models.
| senko wrote:
| And Linux kernel, curl, SQLite and many other open source
| software are worth infinitely more than the purchase price.
|
| Also, you cut off the "from the benchmark" part; this doesn't
| expect it to solve any random Github issue, just the ones
| from the (presumably manually vetted and cleaned up) bench
| dataset.
| minimaxir wrote:
| Linux kernel, curl, and SQLite don't require significant
| compute cost to develop that put it out of reach of
| hobbyists, and only in the reach of organizations expecting
| a positive ROI.
| senko wrote:
| The cost of Linux kernel development alone has been
| estimated at a few $B (https://dwheeler.com/essays/linux-
| kernel-cost.html), current figure is probably over tens
| of billions.
|
| Also, the prize doesn't require you to train a new
| foundational model, just that whatever you use is open
| weights or open source.
|
| Theoretically, might be get away with a Llama3.3 (or any
| other model which you think makes sense) with a cleverly
| designed agentic system and a fresh codebase-
| understanding approach, with minimal compute cost.
|
| (ok, probably not that easy, but just saying there's much
| more to AI coding that the underlying model)
| bruce511 wrote:
| >> The cost of Linux kernel development alone has been
| estimated at a few $B (https://dwheeler.com/essays/linux-
| kernel-cost.html), current figure is probably over tens
| of billions.
|
| I followed your link, but it doesn't seem to bear out
| upur assertion. The two numbers mentioned in the article
| are 176 mil and 612 mil. Mind you those weren't an
| estimate of cost, but rather an estimate to replace.
| Article is dated 2004, with an update in 2011.
|
| Using the lines-of-code estimation it crossed a billion
| in 2010 - again _to replace_. That has no relation to
| what it did actually cost.
|
| Getting from there to "tens of billions" seems a stretch.
| Assuming a bottom value in your estimate of 20 billion,
| and assuming a developer costs a million a year, that's
| 20 000 man-years of effort. Which implies something like
| 2000 people (very well paid people) working continuously
| for the last decade.
|
| Which seems, well, unlikely.
| olddustytrail wrote:
| There are around 5000 active kernel devs, they are
| generally highly skilled and therefore highly paid, and
| they've been working for a lot longer than 10 years.
|
| So doesn't seem that unlikely based on your estimates.
| stevage wrote:
| Highly paid like a million a year? Is that a thing?
| senko wrote:
| Linux kernel has been in development since the nineties,
| not just for the last ten years. Also 5000 contributors
| is a lot more than 2000 from gp's comment.
|
| Let's ignore the years before dotcom boom since the dev
| community was probably much smaller, and assume an
| average of 3500 contributors since.
|
| That's 25 years * 3500 contributors on average * 200k
| salary (total employee cost, not take home) = $17.5b
|
| Napkin math, but order of magnitude checks out.
| senko wrote:
| > The two numbers mentioned in the article are 176 mil
| and 612 mil.
|
| Those two numbers are from the intro. The postscript and
| the updates at the end mention $1.4b and $3b
| respectively.
|
| The real cost is probably impossible to calculate, but
| that order of magnitude is a reasonable estimate IMHO,
| and absolutely comparable, or even larger, than compute
| costs for SOTA LLMs
| andyk wrote:
| (reposting from locallama and lower down here) yep that's
| true.
|
| one of my goals is to inspire and honor those that work on
| open source AI. Those people tend to be motivated by things
| like impact and the excitement of being part of something
| big. i know that's how i always feel when i'm around Berkeley
| and get to meet or work with OG BSD hackers or the people who
| helped invent core internet protocols.
|
| those people are doing this kind of OSS work and sharing it
| with the world anyway, without any cash prize. i think of
| this as a sort of thank you gift for them. and also a way to
| maybe convince a few people to explore that path who might
| not have otherwise.
| frgtpsswrdlame wrote:
| If you're the only one that can come close. Kaggle
| competition prizes are about focusing smart people on the
| same problem. But it's very rare for one team to blow all the
| others out of the water. So if you wanted to make a business
| out of the problem kaggle will (probably) show the best you
| could do and still have no moat.
| kenjackson wrote:
| Surprised to see Amazon Q Developer already at 55% on the
| verified suite.
|
| But what I appreciate even more is that we keep pushing the bar
| for what an AI can/should be able to do. Excited to track this
| benchmark over time.
| Upvoter33 wrote:
| Very cool to see "outcome oriented" prizes like this -- it's
| another way to fund research, perhaps. Will be curious to track
| who does this and whether success in the prize correlates with
| deep innovation ...
| xianshou wrote:
| SWE-bench with a private final eval, so you can't hack the test
| set!
|
| In a perfect world this wouldn't be necessary, but in the current
| research environment where benchmarks are the primary currency
| and are usually taken at face value, more unbiased evals with
| known methodology but hidden tests are exactly what we need.
|
| Also one reason why, for instance, I trust small but well-curated
| benchmarks such as Aider (https://aider.chat/docs/leaderboards/)
| or Wolfram (https://www.wolfram.com/llm-benchmarking-
| project/index.php.e...) over large, widely targeted, and
| increasingly saturated or gamed benchmarks such as LMSYS Arena or
| HumanEval.
|
| Goodhart's law is thriving and it's our duty to fight it.
| theogravity wrote:
| What would be an example of cheating since it says "no cheating".
| NitpickLawyer wrote:
| The only reasonable way to cheat on this would be to find real
| bugs in many repos, train your models on the solutions, wait
| till the cut-off period, report those bugs, propose PRs and
| hope your bugs get selected. Pretty small chances, tbh and
| probably not worth the rewards (the 90% solve rate is pretty
| much impossible given the constraints - 4x L4s and ~4-6min /
| problem. There's no way any models that can be ran on those
| machines under those time limits are that much better than the
| SotA frontier models)
| zamadatix wrote:
| The "when they can't cheat" comment relates to the "Why make a
| contamination free version?" section.
| andyk wrote:
| That has a double meaning - half tongue in cheek.
|
| 1) since we are creating a contamination-free version of SWE-
| bench (i.e. scraping a new test set after submissions are
| frozen) it is guaranteed that agents in this contest can't
| "cheat", i.e., models can't have trained on the benchmark /
| agents cant memorize answers.
|
| 2) as a general rule in life, don't cheat on things (not that
| there aren't exceptions)
| minimaxir wrote:
| The author posted about the original tweet announcement a couple
| days ago: https://news.ycombinator.com/item?id=42413392
|
| In reponse to my comment of "Realistically, an AI that can
| perform that well is worth a lot, lot more than $1M.", he said:
|
| > yeah i agree. one of my goals is to inspire and honor those
| that work on open source AI.
|
| > people who work on open source tend to be motivated by things
| like impact and the excitement of being part of something bigger
| than themselves - at least that's how i always feel when i'm
| around Berkeley and get to meet or work with OG BSD hackers and
| people who helped invent core internet protocols or the guys who
| invented RISC or more recently RISC-V
|
| > those people are going to do this kind of OSS work and share it
| with the world anyway, without any cash prize. i think of this as
| a sort of thank you gift for them. and also a way to maybe
| convince a few people to explore that path who might not have
| otherwise.
| Mistletoe wrote:
| Idk how accurate net worth predictors on the web are but it says
| his net worth is $20 million. Is this from his personal funds?
| SamvitJ wrote:
| He is a Databricks co-founder:
| https://www.forbes.com/sites/kenrickcai/2021/05/26/accidenta...
|
| Likely worth at least $500M, even if his stake was smaller than
| some of the other co-founders.
| andyk wrote:
| yes the prize money is from me to the winners
| vouaobrasil wrote:
| This is why AI advances so quickly. There are easy economic
| mechanisms to encourage it, while AI safety laws have to go
| through an arduous process. Seems rather lopsided when the
| technology can be potentially dangerous. We should have
| mechanisms to take a step back and examine this stuff with more
| caution, mechanisms which have equal force to economic force but
| we don't. The Amish have a much better model.
| MichaelZuo wrote:
| Who gets to define 'potentially dangerous'?
|
| Isn't that the core issue in the first place?
| vouaobrasil wrote:
| We should have a discussion and debate about it. The point
| is, IF people decide that it IS dangerous, then there is no
| mechanism to stop it.
| MichaelZuo wrote:
| Huh? There are well known mechanisms after that point. Such
| as a resolution of the UN Security Council.
| vouaobrasil wrote:
| Really? Let's say the bottom 70% of earners in the U.S.
| decided that AI was dangerous and its development should
| be stopped. Do you think the top 30% would allow that?
| MichaelZuo wrote:
| How does this relate to the Security Council or any other
| known mechanisms that operate in the world?
| neonate wrote:
| https://twitter.com/andykonwinski/status/1867015050403385674
|
| https://kprize.ai/
| stevage wrote:
| Man, imagine having a million bucks to just give away to
| something you think is cool.
| andyk wrote:
| andy here - happy to answer questions.
|
| Also, I answered a bunch of questions yesterday on LocalLLaMA
| that people here might find interesting
| https://www.reddit.com/r/LocalLLaMA/comments/1hdfng5/ill_giv...
___________________________________________________________________
(page generated 2024-12-16 23:00 UTC)