hngopher.com

       [HN Gopher] Problem solving across 100,633 lines of code - Gemin...
       ___________________________________________________________________
        
       Problem solving across 100,633 lines of code - Gemini 1.5 Pro Demo
       [video]
        
       Author : dario_satu
       Score  : 75 points
       Date   : 2024-02-15 18:51 UTC (4 hours ago)
        
 (HTM) web link (www.youtube.com)
 (TXT) w3m dump (www.youtube.com)
        
       | someotherperson wrote:
       | It's really interesting to see where all of this is going. I
       | guess a large part of the best practices behind clearly naming
       | things for human interpretation has allowed for training and
       | evaluation of things by ML models too.
       | 
       | Meta, but the speaker sounds eerily close to Mark Zuckerberg.
        
         | riku_iki wrote:
         | Companies will fire 70% boilerplate coders in following years
        
           | visarga wrote:
           | Or just write 3x more boilerplate. I don't think we are
           | saturated with software yet.
        
           | jakewins wrote:
           | Na we'll all be moved to perpetual on-call, every day an
           | endless fire drill as hundreds of services are launched on
           | top of the crumbling crash looping burning landscape of
           | millions of services launched last quarter, a Mad Max world
           | of endless adrenaline and New Relic AI-enhanced alerts.
        
             | riku_iki wrote:
             | but some companies which build reliable old school software
             | will win the market?..
        
       | cheese4242 wrote:
       | Why would anybody trust this after they faked the last Gemini
       | demo?
        
         | buildbot wrote:
         | Exactly. Give us access the model and let independant
         | researchers test it. OpenAI did this with GPT4, opening access
         | publicly and deeper access to researchers within and outside of
         | Microsoft.
         | 
         | I simply don't believe the model is that good. Otherwise, maybe
         | try to compete with OpenAI directley?
        
           | marfil wrote:
           | Wonder why they're not just giving us access, if it's indeed
           | so good? Seems it's just to generate some noise and hype
           | around Gemini. Hardly believable after the previous faked
           | demo, as someone already said.
        
         | 1oooqooq wrote:
         | Even the demo now is careful to show curated but possible
         | things now. they learned their lesson.
         | 
         | The code changes are the most common tutorials you can find on
         | the web. Adding a speed slider, the terrain tutorials are
         | literary called "height maps" and focus on making it taller or
         | flatter.
        
           | sjwhevvvvvsj wrote:
           | The lesson was don't get caught, not don't do it.
        
         | mnk47 wrote:
         | To be fair, they mostly faked the near instantaneous, real-time
         | flow of the conversations. The answers were, as far as I know,
         | legit. But I still agree that we should be skeptical.
        
           | 2099miles wrote:
           | The prompts they used were also different than the ones given
           | like "is this the right order" was "is this the right order,
           | consider the distance from the sun" they put this in their
           | post on Google dev blog.
           | 
           | This one seems to be super straightforward about timeliness
           | and capabilities, but the examples might be a bit simpler
           | than people think. This is pretty amazing but like someone
           | else said you could achieve similar results from rag due to
           | the lack of novelty in these questions and the fact that each
           | dealt with pretty independent examples as opposed to using
           | custom code developed elsewhere in the codebase.
        
       | m3kw9 wrote:
       | Is this where unit tests will be very useful, where you ask it to
       | fix all bugs found, and make sure it passes all unit tests. This
       | is where all the github's public repos will get really
       | interesting forks.
        
       | losvedir wrote:
       | I'm pretty excited about the increased context length (e.g. in my
       | other comment here[0]), but I'm kind of disappointed by the
       | examples here.
       | 
       | The codebase is 100k lines, but the tasks it gave it seemed to be
       | focused on just hundreds of examples. The examples are probably
       | largely independent, so it doesn't seem like this is really
       | flexing anything a relatively simple RAG approach with a much
       | smaller context window couldn't handle. The prompts said "the
       | demo that ...", so it's a matter of identifying the demo in
       | question and looking at just that code, which is a much smaller
       | necessary context. There was the "use the GUI approach from other
       | examples" task, which kind of gets there, but that's kind of
       | another distinct little bit of code.
       | 
       | In other words, while the codebase has lots of lines, the actual
       | inference across them seemed to use relatively few of them, and
       | identifying the relevant lines didn't seem that hard to me based
       | on the tasks given. That means it could be done with some
       | retrieval and a much smaller context window.
       | 
       | From the title, I thought it would be loading 100k into the
       | context and then asking some deeper questions like "find the bug"
       | that spans several function calls or something like that.
       | Something that wouldn't be trivial to accomplish with current
       | techniques.
       | 
       | [0] https://news.ycombinator.com/context?id=39384034
        
       | pfdietz wrote:
       | I want to see something where we have a big piece of code, and a
       | big standards document it purports to implement, and the system
       | can answer questions like "is this part of the spec implemented?
       | Where it is implemented? What does this piece of code mean
       | (w.r.t. the spec)? If I implemented this part of the spec, where
       | would the changes go?"
        
       | rst wrote:
       | What they did in this demo is collect a bunch of small demos,
       | small enough that earlier models could have answered questions
       | about them, or dinked them, individually, and _mostly_
       | demonstrated that the model could figure out which demo was
       | pertinent to the question that they were asking, and focus only
       | on that.
       | 
       | But the input was still divisible into self-contained little bits
       | -- so this is still somewhat different from dumping the full
       | source code for a database engine into it, and having it answer
       | questions about, say, where foreign key constraints are
       | implemented -- or, more dramatically, how several different parts
       | of the codebase work together to implement, say, transaction
       | isolation levels.
        
       ___________________________________________________________________
       (page generated 2024-02-15 23:01 UTC)