[HN Gopher] GitHub code search is generally available
       ___________________________________________________________________
        
       GitHub code search is generally available
        
       Author : todsacerdoti
       Score  : 169 points
       Date   : 2023-05-08 16:01 UTC (6 hours ago)
        
 (HTM) web link (github.blog)
 (TXT) w3m dump (github.blog)
        
       | j1elo wrote:
       | I'm happy that at last, my Stack Overflow question and answer
       | have been fully solved with technology improvements!
       | 
       | https://stackoverflow.com/questions/43891605/search-partial-...
       | 
       | It's been almost _6 years_ , though... for a search scenario that
       | would be trivial to implement with _grep_ (at scale that 's
       | another thing...) Still, a nice example of perfect being the
       | enemy of good, I guess.
        
         | 100k wrote:
         | Thanks for your patience! It has been a long road with some
         | dead ends (we've wanted to add this since 2012 at least). We
         | actually wrote about why we didn't just use grep in our last
         | blog post: https://github.blog/2023-02-06-the-technology-
         | behind-githubs...
        
           | RulerOf wrote:
           | I really enjoyed that blog post. Especially the comparisons
           | that lay out how feasible it can actually be to do the
           | stupid, simple thing of "grep the whole data set every time"
           | up to a surprising point.
        
       | synergy20 wrote:
       | hold on, I paid for copilot, what does github-code-search buy for
       | me? not to mention I can kind of search its code already in the
       | past.
        
       | abathur wrote:
       | I generally like the new code search, but I've got one big gripe:
       | there's no way to sort code results by any kind of proxy for
       | recency.
       | 
       | The old code search had the ability to sort by indexed date. This
       | wasn't perfect, but it was something.
       | 
       | I like keeping up with who's using my code and whether they're
       | leaving comments or commit chains that outline trouble they're
       | having with it. Sometimes old code pops up in the recently-
       | indexed sort, but if I regularly search and look at the top page,
       | I can see _most_ new uses.
       | 
       | Without it, code search is basically useless for this purpose :/
        
         | 100k wrote:
         | (I work on code search.) Yeah, sorry about that. We've heard
         | this feedback a lot. There's two reasons why we haven't
         | implemented this. First, content is shared between repositories
         | which makes this harder than before, when it wasn't. Second, we
         | rebuild the index weekly or even more frequently, so the proxy
         | of "when was this added" that was used doesn't work any more.
         | What we would _like_ to use is  "when was this blob added to
         | this branch" but that's extremely expensive to retrieve from
         | Git because Git trees don't record it.
        
           | ofek wrote:
           | Does this mean it will not be implemented?
        
             | 100k wrote:
             | We want to do it right if we do implement it, but I can't
             | promise anything concretely. It's not trivial,
             | unfortunately.
        
               | swyx wrote:
               | maybe lower standards would help - what does doing it
               | not-quite-right-but-in-one-week look like? can mitigate
               | by setting expectations accordingly
        
       | colin353 wrote:
       | I'm Colin from GitHub's code search team, happy to answer any
       | questions.
       | 
       | For more info on how we built this, you can check out our
       | technical blog post from a few months ago
       | https://github.blog/2023-02-06-the-technology-behind-githubs...
        
         | airstrike wrote:
         | I imagine a future in which this is integrated into vscode so I
         | can go from an error message in the terminal to a search
         | through my code + third-party modules that my code is importing
        
         | davidrjenni wrote:
         | How does it compare to Sourcegraph? What is the main
         | differentiator?
        
         | thinkingemote wrote:
         | Great stuff, is there any update on searches which include code
         | in branches? I often manually find interesting work done in
         | development branches of cloned repos of the one I'm focused on
         | but which never sees the light of day and not found in search.
         | I imagine having the network also part of the search would be a
         | good facet.
        
         | lexh wrote:
         | Will this make it to GH enterprise eventually?
        
           | colin353 wrote:
           | Yes, we're working on bringing it to GitHub enterprise right
           | now.
        
             | lexh wrote:
             | Fantastic. Love the functionality on public GH and always
             | find myself missing it at work.
        
         | catchmeifyoucan wrote:
         | Loving the new Code Search! Might be super specific, but is
         | there any syntax for searching attributes in HTML elements. For
         | example if a React Component called <Button ...some-props
         | color="red" /> what's the best way to find all the buttons that
         | are red?
        
           | colin353 wrote:
           | Hmm, you can construct a regular expression, something like:
           | lang:tsx /<Button[^\\].\* color="red"/
           | 
           | Example:
           | 
           | https://github.com/search?q=lang%3Atsx+%2F%3CButton%5B%5E%5C.
           | ..
        
             | arthurcolle wrote:
             | You can do code blocks with 4 consecutive spaces. Backticks
             | are not supported unfortunately
        
             | catchmeifyoucan wrote:
             | Ooh nice, this looks great! Thanks!!
        
         | panic wrote:
         | Are you planning to release code search as open source? I can't
         | find a link to the source code anywhere.
        
         | jhgg wrote:
         | It would be really awesome if code search could one day consume
         | LSIF for precise results in its index similar to source graph.
         | The symbol search is good now, but approximate. Having more
         | precise code search by allowing devs to upload LSIF data in
         | their CI pipelines would allow for precise symbol search (go to
         | definition / find usages actually being accurate) and remove
         | irrelevant result.
        
           | colin353 wrote:
           | Great point. Yes, we initially focused on zero-config
           | approximate code navigation. But we do intend to support
           | build-based code navigation in the future, since the
           | approximate code navigation experience can be pretty poor for
           | some languages (e.g. C/C++).
        
         | dmix wrote:
         | Any plans to add support for Vue SFC syntax highlighting?
         | 
         | Edit: correction it looks like that's been fixed since last
         | week, nm
        
       | ren_engineer wrote:
       | Surprised more people aren't talking about how Microsoft has a
       | near monopoly on the developer ecosystem. They've got GitHub,
       | OpenAI, and VS Code all working together and collecting data that
       | strengthen each other's products while also using their embrace,
       | extend, extinguish strategy with WSL and all of these steer
       | people towards Azure services whenever possible. Seems like
       | something that verges on an anti-trust situation when you think
       | about the flywheel effect data has for AI
       | 
       | credit to Microsoft for rehabbing their reputation with
       | developers but it seems like a massive trojan horse
        
         | synergy20 wrote:
         | except windows itself is not loved by most developers,
         | microsoft will take over the developer world when it replaces
         | its windows with linux fully(instead of WSL2, which is nice but
         | not great)
        
           | charlieyu1 wrote:
           | Can't see Linux replacing windows, their most profitable
           | products (Windows and MS Office) are both based on closed
           | ecosystem
        
             | rad_gruchalski wrote:
             | I don't know. I'm happily using ms office on a mac and in
             | the browser.
        
             | Shared404 wrote:
             | Isn't Azure a much higher percentage of profit than Windows
             | or Office for MS at this point?
        
               | tcmart14 wrote:
               | I don't know the numbers by heart. But if it isn't a
               | higher percentage of profit right now.It certainly takes
               | the cake for largest growth percentage with it eclipsing
               | everything else soon (if it hasn't already).
        
           | waboremo wrote:
           | Part of the reason Windows isn't loved by developers is also
           | hardware. So a switch to Linux won't fix this, unless they
           | made the switch when Apple was releasing those horrific
           | keyboards!
        
         | manojlds wrote:
         | Meanwhile CMA: you are leading cloud gaming and hence can't
         | acquire Activision.
        
         | tester756 wrote:
         | "monopoly on developer ecosystem"
         | 
         | GitHub? fair
         | 
         | OpenAI? how is this a part of dev. ecosystem?
         | 
         | Vs Code? wtf? there's a lot of other IDEs/editors and many
         | would argue that they are better
         | 
         | >embrace, extend, extinguish strategy with WSL
         | 
         | They are EEEing their product - Windows?
        
           | VirusNewbie wrote:
           | I mean google uses VSCode internally as their officially
           | supported IDE... i'd say they're doing pretty well.
        
           | capableweb wrote:
           | > > embrace, extend, extinguish strategy with WSL
           | 
           | > They are EEEing their product - Windows?
           | 
           | No, Linux obviously.
           | 
           | First they like and integrate Linux into their own products.
           | Azure, WSL and others.
           | 
           | Then, they provide extensions that are closed-source on top
           | of those.
           | 
           | With the goal to extinguish the original project so they have
           | more control over the direction.
        
         | linhns wrote:
         | I think there was a lot of discussion on this when Microsoft
         | took over GitHub but as time goes people kind of accepted the
         | reality.
        
         | AnonMO wrote:
         | Github -> gitlab, vs code -> jetbrains. why use wsl just go to
         | linux. No one forces you to use their products they're just a
         | better developer experience imo aside from windows. Plenty of
         | competition in the space. The question is does better equal
         | monopoly?
        
         | chillel wrote:
         | I guess Steve Ballmer was right all along...
        
           | misterprime wrote:
           | About developers?
        
             | jansan wrote:
             | No, about Playday:
             | https://www.youtube.com/watch?v=V7PYQCXdX3A
        
               | misterprime wrote:
               | Highly amusing.
        
         | zzzzzzzza wrote:
         | gitea
        
           | synergy20 wrote:
           | good for small projects, I guess the real competitor is
           | gitlab
        
           | makapuf wrote:
           | ... is nice but needs the social network effects. Maybe
           | adding some federation and stars / comments as a protocol
           | (not just a program) could help. Maybe it exists and lacks
           | coherence/ publicity.
        
             | pydry wrote:
             | One of the best things that could be done for open source
             | is to break the monopoly on those damn stars.
             | 
             | Open source would be a lot healthier if social proof were
             | portable across platforms.
        
               | evilspammer wrote:
               | ...who has ever cared about stars?
        
               | rad_gruchalski wrote:
               | ,,Our product has 5000 stars on GitHub, therefore give us
               | money, it's a business opportunity". Seen plenty of those
               | on LinkedIn.
        
         | JohnFen wrote:
         | > more people aren't talking about how Microsoft has a near
         | monopoly on the developer ecosystem.
         | 
         | But do they? In my day job, outside of the occasional use of
         | Visual Studio and developing on a Windows machine, I use no
         | Microsoft products for development.
         | 
         | > credit to Microsoft for rehabbing their reputation with
         | developers
         | 
         | With a fair number of younger developers, but certainly not
         | all. Most devs I know don't think of Microsoft any more kindly
         | now than in the past.
        
           | tcmart14 wrote:
           | I don't think they have one today, but it is looking like
           | they could soon have one. Especially since I believe recent
           | reports show that Azure is starting out pace AWS in adoption?
           | So you have .NET, Visual Studio (Code), Azure, Github,
           | OpenAI, Windows, and I am pretty sure more I am forgetting
           | about. I think the big one that wasn't initially mentioned
           | was Azure.
        
             | JohnFen wrote:
             | All of that adds up to having a very significant, perhaps
             | majority, of the market locked up. But it's also all
             | centering around a particular sort of product and product
             | development. Microsoft might be able to lock up that
             | segment, but I don't think they're in a position to
             | monopolize the larger software development space in the
             | near or medium future.
        
         | capableweb wrote:
         | Don't forget TypeScript and npm as well which basically covers
         | 99% of the JavaScript ecosystem if not more.
         | 
         | Then Dependabot for large swatches of more developers outside
         | of the earlier mentioned ecosystems. LinkedIn for everyone's
         | career.
        
         | dahwolf wrote:
         | I would expect them to purchase StackOverflow too. I guess it's
         | not that essential, seems a low cost acquisition.
        
           | Kuinox wrote:
           | Why would they purchase StackOverflow ? They are partnered
           | with the StackOverflow killer: ChatGPT.
        
             | atq2119 wrote:
             | If StackOverflow dies, surely the developer-relevant
             | quality of training data will suffer?
             | 
             | After all, the capabilities of ChatGPT are basically
             | proportional to how well a topic is represented in the
             | training data, which is largely the internet.
        
         | paulddraper wrote:
         | Now they just need StackOverflow.
         | 
         | They already run a .NET stack....
        
         | rj1 wrote:
         | i think it extends further than that, since they have: vscode,
         | github, linkedin, npm, typescript, chatgpt. for many, this is
         | almost the entire developer ecosystem.
         | 
         | at a high level they pretend to embrace open source but many of
         | the best features of vscode are closed source, such as remote
         | editing and various language servers (pylance, etc.) the lsp
         | saga is particularly unfriendly, since they pushed it as an
         | open standard, tons of people contributed and adopted it, and
         | then they closed the source to their most valuable language
         | servers, making them only compatible with their product
         | (vscode).
         | 
         | there are countless similar examples. the way i see microsoft
         | and the way they want to be perceived are entirely different.
        
       | aetherane wrote:
       | I don't get the praise over GitHub code search. I find it very
       | inaccurate and often missing references etc. Maybe it depends on
       | the language you are using it with? (Go here)
        
         | jacobr1 wrote:
         | Have you tried the new search that was just launched? It seems
         | to have significantly improved search accuracy. I agree the old
         | version wasn't that great (though still was one of the better
         | options for finding usage of things in the wild like rarely
         | used OSS dependencies I needed to debug).
        
           | aetherane wrote:
           | I was in the beta, which im assuming was the same. I find it
           | sometimes misses references in the same folder as the file
           | I'm look at when I do reference search.
        
         | bdcravens wrote:
         | Are you referring to what they just released? Github code
         | search has always been notoriously bad. This is a new search
         | product:
         | 
         | "today, our new code search and code view are generally
         | available to all users on GitHub.com"
        
       | unicornmama wrote:
       | RIP Sourcegraph
        
         | midoBB wrote:
         | In my experience Sourcegraph offers a better integration with
         | Gitlab and Github both. And their code search is far superior.
        
       | h1fra wrote:
       | I have been using since the beta, truly the most impressive
       | product released in the last 5 years (along with chatgpt). The
       | amount of indexed code, the quickness and the precision of this
       | search is simply stunning.
        
         | jjeaff wrote:
         | Could you elaborate more on why this is a significant feature?
         | 
         | I can see how it would be handy to search a codebase online,
         | especially one you don't have cloned locally, but for my own
         | codebases, I can search the entire thing just fine in VS Code
         | with ctrl-shft-f.
        
           | doodlesdev wrote:
           | It's not only about searching your own repository, it allows
           | you to search through every single public repository on
           | GitHub. I personally use it a lot to learn more obscure APIs
           | which are badly documented or which I'm just not used to,
           | simply search for the method I'm trying to use and find
           | infinite examples of real world usage, along with the code
           | license right next to it.
           | 
           | It's also great if you drank the GitHub kool-aid as you can
           | do a single search and find related code snippets, issues,
           | pull requests and discussions that could possibly help. I'm
           | personally not to big into the ecosystem, in fact I'm
           | considering moving to Fossil so I can have everything inside
           | the repo, but for those who are it's a great feature.
        
           | h1fra wrote:
           | I think others have pretty much summarised it already:
           | 
           | Searching for non-documented or badly documented API, find
           | implementation of an algorithm or specific pattern, find how
           | people are using a niche tool, etc.
           | 
           | I have even used it to find my own API in the wild to look
           | for potential breaking changes and improvements to do.
        
           | JansjoFromIkea wrote:
           | For me it's been very useful at finding codebases with work
           | done on extremely specific niche things that would've been
           | near impossible to find otherwise (e.g. tools for obscure
           | protocols hidden away on obtusely named repos)
        
           | simonw wrote:
           | Ever wanted to use an API and found the documentation to be
           | lacking?
           | 
           | GitHub code search pretty much solves that. For any API you
           | can find an example of someone else using it.
           | 
           | I've been using it for this for a year now and I wouldn't
           | want to live without it.
        
             | rco8786 wrote:
             | > For any API you can find an example of someone else using
             | it.
             | 
             | Amazing. This was the light bulb moment for me. I work at
             | [big tech co] and we have an internal code search tool, and
             | anytime I need to use a new API I pull it up to find
             | examples of how it's used.
             | 
             | Now I can do this for the entire world of OSS, amazing.
        
         | Arnavion wrote:
         | I'm pretty sure sourcegraph.com has been doing all that (search
         | all GH repos, exact search in quotes, case-sensitive search,
         | regex search, limit files using filename regex) for longer than
         | 5 years.
        
         | radicality wrote:
         | Is the difference that now the basic search functionality
         | actually works?
         | 
         | The previous standard GitHub search I found to be remarkably
         | bad. I would be looking at some small public repo, search for
         | an exact string match I know exists in the code, scope the
         | search to that repo only, and still see zero results. Even
         | copy-pasting a line of code from a file in the repo often
         | resulted in zero matches.
        
           | AtNightWeCode wrote:
           | I don't get it too. Maybe one needs to enable something. The
           | search is still useless. Locally I use git-grep.
        
             | Arnavion wrote:
             | It seems you still need to enable the feature. Click your
             | user icon in the top right -> choose "Feature preview" in
             | the dropdown -> enable the "New Code Search and Code View"
             | feature.
        
               | bagels wrote:
               | How that is not standard for the last 10 years is
               | baffling.
        
               | Kuinox wrote:
               | Indexing/searching properly such massive data is not easy
               | feat.
        
               | iudqnolq wrote:
               | Indexing cost doesn't scale with number of searchers
        
             | fallat wrote:
             | Same experience here. Are there advertisement accounts
             | posting here or something? Legitimately weird.
        
               | colin353 wrote:
               | Ah, you need to log in to get access to the new code
               | search!
        
             | Trasmatta wrote:
             | You might want to consider replacing git-grep with ripgrep.
        
         | csnover wrote:
         | Absolutely. GitHub Code Search is by far the most valuable
         | online development tool I have used the past year. It is _so_
         | much more useful than Copilot or any of the AI LLMs in my
         | experience.
         | 
         | With Code Search, I have:
         | 
         | * Rewritten a CMake build system, which would have been
         | practically impossible without access to real-world examples
         | because of how poorly designed and documented it is;
         | 
         | * Validated machine-generated translations by looking up
         | language strings from projects that used human translators;
         | 
         | * Tracked down bugs in unfamiliar codebases, using symbol-based
         | navigation, without the downtime of fetching a bunch of files
         | and waiting for a language server to process them locally;
         | 
         | * Reviewed how projects were using the APIs of a library I work
         | on to determine whether high-maintenance features were actually
         | used, and whether tricky features were being used correctly or
         | needed redesigning to reduce programmer error
         | 
         | Kudos to the team at GitHub. Genuinely stellar work.
        
           | neuronexmachina wrote:
           | That's really interesting, do you have any examples handy of
           | search queries you used?
        
             | MaxLeiter wrote:
             | It's amazing for searching examples of rarely used / non-
             | public APIs
        
             | Kuinox wrote:
             | I'm not the one you asked to, but here my usages:
             | 
             | Searching how an API is used.
             | 
             | Searching how people configured something.
             | 
             | Searching accidental uses of an attribute:
             | 
             | https://github.com/dotnet/csharplang/discussions/5657#discu
             | s...
        
         | tex0 wrote:
         | Yes, it comes pretty close. Well done!
        
       | rightbyte wrote:
       | Seems like grep-aaS? Or am I missing something. It could not e.g.
       | get the definition of a C++ function for me. Really useful still,
       | but should have been there years ago.
        
       | rav wrote:
       | I have missed Google Code Search, which launched in 2006 and was
       | discontinued in 2013. Similar to GitHub's code search, it
       | supported searching by regex and filtering by language etc. - but
       | obviously the amount of code to search through is orders of
       | magnitude larger than it was 10-15 years ago. Still I wonder what
       | took GitHub so long to build this - it's hardly a novel idea, and
       | it seems like such an obvious power tool for programmers to have.
        
         | SoKamil wrote:
         | > Still I wonder what took GitHub so long to build this - it's
         | hardly a novel idea, and it seems like such an obvious power
         | tool for programmers to have.
         | 
         | Scale. You can read more about their engineering at
         | https://github.blog/2023-02-06-the-technology-behind-githubs...
        
         | esprehn wrote:
         | You could try https://sourcegraph.com/search
         | 
         | Though GitHub will probably have all the same features
         | eventually and cheaper too.
        
       | AtNightWeCode wrote:
       | How do one access the new search?
        
         | linhns wrote:
         | Should be available to all users now, previously was on beta
        
           | AtNightWeCode wrote:
           | I cannot detect any difference.
        
       | spyremeown wrote:
       | >Without code search, you might have to clone a bunch of
       | repositories and grep through them
       | 
       | Why is this an issue? I do have a clone of all my company repos,
       | doesn't everyone? Memory is cheap and ripgrep/ag are fast.
        
         | [deleted]
        
         | sk0g wrote:
         | No, every microservice or new project/ tool get its own
         | repository. There's new ones being created quite frequently, so
         | I have ~10 repositories cloned that most relate to the work I
         | do, but nothing beyond that.
        
         | edflsafoiewq wrote:
         | Bandwidth is not cheap for many of us.
        
           | henryfjordan wrote:
           | how big are the repos at your company?
           | 
           | Bitbucket has a 2GB max size on each repo, I'd suspect other
           | providers have similar restrictions. Would be a problem on
           | mobile but repos that hit that limit are rare.
        
       | chx wrote:
       | Don't get me wrong but how is this any better than say Scintilla
       | searching locally? I feel like the previous one was very bad and
       | this is baseline but please tell me where I am wrong. From a
       | brief test this looks like something you'd have built with
       | Sphinxsearch a decade plus ago?
        
       | perpil wrote:
       | The time to pull request with a good code search, paired with
       | GitHub.dev and Copilot is really a force multiplier. What a time
       | to be alive.
        
       | thih9 wrote:
       | I initially thought "generally available" refers to removing the
       | login wall. I remember when you didn't have to sign in to search
       | code on GitHub; I miss that.
        
         | marginalia_nu wrote:
         | Probably due to bot abuse. Can't have nice things on the
         | internet.
         | 
         | Requiring an account makes rate limiting vs botnets easier.
        
       | kazinator wrote:
       | It isn't generally available; it requires login.
        
       ___________________________________________________________________
       (page generated 2023-05-08 23:01 UTC)