[HN Gopher] Problem solving across 100,633 lines of code - Gemin...
___________________________________________________________________
Problem solving across 100,633 lines of code - Gemini 1.5 Pro Demo
[video]
Author : dario_satu
Score : 75 points
Date : 2024-02-15 18:51 UTC (4 hours ago)
(HTM) web link (www.youtube.com)
(TXT) w3m dump (www.youtube.com)
| someotherperson wrote:
| It's really interesting to see where all of this is going. I
| guess a large part of the best practices behind clearly naming
| things for human interpretation has allowed for training and
| evaluation of things by ML models too.
|
| Meta, but the speaker sounds eerily close to Mark Zuckerberg.
| riku_iki wrote:
| Companies will fire 70% boilerplate coders in following years
| visarga wrote:
| Or just write 3x more boilerplate. I don't think we are
| saturated with software yet.
| jakewins wrote:
| Na we'll all be moved to perpetual on-call, every day an
| endless fire drill as hundreds of services are launched on
| top of the crumbling crash looping burning landscape of
| millions of services launched last quarter, a Mad Max world
| of endless adrenaline and New Relic AI-enhanced alerts.
| riku_iki wrote:
| but some companies which build reliable old school software
| will win the market?..
| cheese4242 wrote:
| Why would anybody trust this after they faked the last Gemini
| demo?
| buildbot wrote:
| Exactly. Give us access the model and let independant
| researchers test it. OpenAI did this with GPT4, opening access
| publicly and deeper access to researchers within and outside of
| Microsoft.
|
| I simply don't believe the model is that good. Otherwise, maybe
| try to compete with OpenAI directley?
| marfil wrote:
| Wonder why they're not just giving us access, if it's indeed
| so good? Seems it's just to generate some noise and hype
| around Gemini. Hardly believable after the previous faked
| demo, as someone already said.
| 1oooqooq wrote:
| Even the demo now is careful to show curated but possible
| things now. they learned their lesson.
|
| The code changes are the most common tutorials you can find on
| the web. Adding a speed slider, the terrain tutorials are
| literary called "height maps" and focus on making it taller or
| flatter.
| sjwhevvvvvsj wrote:
| The lesson was don't get caught, not don't do it.
| mnk47 wrote:
| To be fair, they mostly faked the near instantaneous, real-time
| flow of the conversations. The answers were, as far as I know,
| legit. But I still agree that we should be skeptical.
| 2099miles wrote:
| The prompts they used were also different than the ones given
| like "is this the right order" was "is this the right order,
| consider the distance from the sun" they put this in their
| post on Google dev blog.
|
| This one seems to be super straightforward about timeliness
| and capabilities, but the examples might be a bit simpler
| than people think. This is pretty amazing but like someone
| else said you could achieve similar results from rag due to
| the lack of novelty in these questions and the fact that each
| dealt with pretty independent examples as opposed to using
| custom code developed elsewhere in the codebase.
| m3kw9 wrote:
| Is this where unit tests will be very useful, where you ask it to
| fix all bugs found, and make sure it passes all unit tests. This
| is where all the github's public repos will get really
| interesting forks.
| losvedir wrote:
| I'm pretty excited about the increased context length (e.g. in my
| other comment here[0]), but I'm kind of disappointed by the
| examples here.
|
| The codebase is 100k lines, but the tasks it gave it seemed to be
| focused on just hundreds of examples. The examples are probably
| largely independent, so it doesn't seem like this is really
| flexing anything a relatively simple RAG approach with a much
| smaller context window couldn't handle. The prompts said "the
| demo that ...", so it's a matter of identifying the demo in
| question and looking at just that code, which is a much smaller
| necessary context. There was the "use the GUI approach from other
| examples" task, which kind of gets there, but that's kind of
| another distinct little bit of code.
|
| In other words, while the codebase has lots of lines, the actual
| inference across them seemed to use relatively few of them, and
| identifying the relevant lines didn't seem that hard to me based
| on the tasks given. That means it could be done with some
| retrieval and a much smaller context window.
|
| From the title, I thought it would be loading 100k into the
| context and then asking some deeper questions like "find the bug"
| that spans several function calls or something like that.
| Something that wouldn't be trivial to accomplish with current
| techniques.
|
| [0] https://news.ycombinator.com/context?id=39384034
| pfdietz wrote:
| I want to see something where we have a big piece of code, and a
| big standards document it purports to implement, and the system
| can answer questions like "is this part of the spec implemented?
| Where it is implemented? What does this piece of code mean
| (w.r.t. the spec)? If I implemented this part of the spec, where
| would the changes go?"
| rst wrote:
| What they did in this demo is collect a bunch of small demos,
| small enough that earlier models could have answered questions
| about them, or dinked them, individually, and _mostly_
| demonstrated that the model could figure out which demo was
| pertinent to the question that they were asking, and focus only
| on that.
|
| But the input was still divisible into self-contained little bits
| -- so this is still somewhat different from dumping the full
| source code for a database engine into it, and having it answer
| questions about, say, where foreign key constraints are
| implemented -- or, more dramatically, how several different parts
| of the codebase work together to implement, say, transaction
| isolation levels.
___________________________________________________________________
(page generated 2024-02-15 23:01 UTC)