[HN Gopher] GitHub code search is generally available
___________________________________________________________________
GitHub code search is generally available
Author : todsacerdoti
Score : 169 points
Date : 2023-05-08 16:01 UTC (6 hours ago)
(HTM) web link (github.blog)
(TXT) w3m dump (github.blog)
| j1elo wrote:
| I'm happy that at last, my Stack Overflow question and answer
| have been fully solved with technology improvements!
|
| https://stackoverflow.com/questions/43891605/search-partial-...
|
| It's been almost _6 years_ , though... for a search scenario that
| would be trivial to implement with _grep_ (at scale that 's
| another thing...) Still, a nice example of perfect being the
| enemy of good, I guess.
| 100k wrote:
| Thanks for your patience! It has been a long road with some
| dead ends (we've wanted to add this since 2012 at least). We
| actually wrote about why we didn't just use grep in our last
| blog post: https://github.blog/2023-02-06-the-technology-
| behind-githubs...
| RulerOf wrote:
| I really enjoyed that blog post. Especially the comparisons
| that lay out how feasible it can actually be to do the
| stupid, simple thing of "grep the whole data set every time"
| up to a surprising point.
| synergy20 wrote:
| hold on, I paid for copilot, what does github-code-search buy for
| me? not to mention I can kind of search its code already in the
| past.
| abathur wrote:
| I generally like the new code search, but I've got one big gripe:
| there's no way to sort code results by any kind of proxy for
| recency.
|
| The old code search had the ability to sort by indexed date. This
| wasn't perfect, but it was something.
|
| I like keeping up with who's using my code and whether they're
| leaving comments or commit chains that outline trouble they're
| having with it. Sometimes old code pops up in the recently-
| indexed sort, but if I regularly search and look at the top page,
| I can see _most_ new uses.
|
| Without it, code search is basically useless for this purpose :/
| 100k wrote:
| (I work on code search.) Yeah, sorry about that. We've heard
| this feedback a lot. There's two reasons why we haven't
| implemented this. First, content is shared between repositories
| which makes this harder than before, when it wasn't. Second, we
| rebuild the index weekly or even more frequently, so the proxy
| of "when was this added" that was used doesn't work any more.
| What we would _like_ to use is "when was this blob added to
| this branch" but that's extremely expensive to retrieve from
| Git because Git trees don't record it.
| ofek wrote:
| Does this mean it will not be implemented?
| 100k wrote:
| We want to do it right if we do implement it, but I can't
| promise anything concretely. It's not trivial,
| unfortunately.
| swyx wrote:
| maybe lower standards would help - what does doing it
| not-quite-right-but-in-one-week look like? can mitigate
| by setting expectations accordingly
| colin353 wrote:
| I'm Colin from GitHub's code search team, happy to answer any
| questions.
|
| For more info on how we built this, you can check out our
| technical blog post from a few months ago
| https://github.blog/2023-02-06-the-technology-behind-githubs...
| airstrike wrote:
| I imagine a future in which this is integrated into vscode so I
| can go from an error message in the terminal to a search
| through my code + third-party modules that my code is importing
| davidrjenni wrote:
| How does it compare to Sourcegraph? What is the main
| differentiator?
| thinkingemote wrote:
| Great stuff, is there any update on searches which include code
| in branches? I often manually find interesting work done in
| development branches of cloned repos of the one I'm focused on
| but which never sees the light of day and not found in search.
| I imagine having the network also part of the search would be a
| good facet.
| lexh wrote:
| Will this make it to GH enterprise eventually?
| colin353 wrote:
| Yes, we're working on bringing it to GitHub enterprise right
| now.
| lexh wrote:
| Fantastic. Love the functionality on public GH and always
| find myself missing it at work.
| catchmeifyoucan wrote:
| Loving the new Code Search! Might be super specific, but is
| there any syntax for searching attributes in HTML elements. For
| example if a React Component called <Button ...some-props
| color="red" /> what's the best way to find all the buttons that
| are red?
| colin353 wrote:
| Hmm, you can construct a regular expression, something like:
| lang:tsx /<Button[^\\].\* color="red"/
|
| Example:
|
| https://github.com/search?q=lang%3Atsx+%2F%3CButton%5B%5E%5C.
| ..
| arthurcolle wrote:
| You can do code blocks with 4 consecutive spaces. Backticks
| are not supported unfortunately
| catchmeifyoucan wrote:
| Ooh nice, this looks great! Thanks!!
| panic wrote:
| Are you planning to release code search as open source? I can't
| find a link to the source code anywhere.
| jhgg wrote:
| It would be really awesome if code search could one day consume
| LSIF for precise results in its index similar to source graph.
| The symbol search is good now, but approximate. Having more
| precise code search by allowing devs to upload LSIF data in
| their CI pipelines would allow for precise symbol search (go to
| definition / find usages actually being accurate) and remove
| irrelevant result.
| colin353 wrote:
| Great point. Yes, we initially focused on zero-config
| approximate code navigation. But we do intend to support
| build-based code navigation in the future, since the
| approximate code navigation experience can be pretty poor for
| some languages (e.g. C/C++).
| dmix wrote:
| Any plans to add support for Vue SFC syntax highlighting?
|
| Edit: correction it looks like that's been fixed since last
| week, nm
| ren_engineer wrote:
| Surprised more people aren't talking about how Microsoft has a
| near monopoly on the developer ecosystem. They've got GitHub,
| OpenAI, and VS Code all working together and collecting data that
| strengthen each other's products while also using their embrace,
| extend, extinguish strategy with WSL and all of these steer
| people towards Azure services whenever possible. Seems like
| something that verges on an anti-trust situation when you think
| about the flywheel effect data has for AI
|
| credit to Microsoft for rehabbing their reputation with
| developers but it seems like a massive trojan horse
| synergy20 wrote:
| except windows itself is not loved by most developers,
| microsoft will take over the developer world when it replaces
| its windows with linux fully(instead of WSL2, which is nice but
| not great)
| charlieyu1 wrote:
| Can't see Linux replacing windows, their most profitable
| products (Windows and MS Office) are both based on closed
| ecosystem
| rad_gruchalski wrote:
| I don't know. I'm happily using ms office on a mac and in
| the browser.
| Shared404 wrote:
| Isn't Azure a much higher percentage of profit than Windows
| or Office for MS at this point?
| tcmart14 wrote:
| I don't know the numbers by heart. But if it isn't a
| higher percentage of profit right now.It certainly takes
| the cake for largest growth percentage with it eclipsing
| everything else soon (if it hasn't already).
| waboremo wrote:
| Part of the reason Windows isn't loved by developers is also
| hardware. So a switch to Linux won't fix this, unless they
| made the switch when Apple was releasing those horrific
| keyboards!
| manojlds wrote:
| Meanwhile CMA: you are leading cloud gaming and hence can't
| acquire Activision.
| tester756 wrote:
| "monopoly on developer ecosystem"
|
| GitHub? fair
|
| OpenAI? how is this a part of dev. ecosystem?
|
| Vs Code? wtf? there's a lot of other IDEs/editors and many
| would argue that they are better
|
| >embrace, extend, extinguish strategy with WSL
|
| They are EEEing their product - Windows?
| VirusNewbie wrote:
| I mean google uses VSCode internally as their officially
| supported IDE... i'd say they're doing pretty well.
| capableweb wrote:
| > > embrace, extend, extinguish strategy with WSL
|
| > They are EEEing their product - Windows?
|
| No, Linux obviously.
|
| First they like and integrate Linux into their own products.
| Azure, WSL and others.
|
| Then, they provide extensions that are closed-source on top
| of those.
|
| With the goal to extinguish the original project so they have
| more control over the direction.
| linhns wrote:
| I think there was a lot of discussion on this when Microsoft
| took over GitHub but as time goes people kind of accepted the
| reality.
| AnonMO wrote:
| Github -> gitlab, vs code -> jetbrains. why use wsl just go to
| linux. No one forces you to use their products they're just a
| better developer experience imo aside from windows. Plenty of
| competition in the space. The question is does better equal
| monopoly?
| chillel wrote:
| I guess Steve Ballmer was right all along...
| misterprime wrote:
| About developers?
| jansan wrote:
| No, about Playday:
| https://www.youtube.com/watch?v=V7PYQCXdX3A
| misterprime wrote:
| Highly amusing.
| zzzzzzzza wrote:
| gitea
| synergy20 wrote:
| good for small projects, I guess the real competitor is
| gitlab
| makapuf wrote:
| ... is nice but needs the social network effects. Maybe
| adding some federation and stars / comments as a protocol
| (not just a program) could help. Maybe it exists and lacks
| coherence/ publicity.
| pydry wrote:
| One of the best things that could be done for open source
| is to break the monopoly on those damn stars.
|
| Open source would be a lot healthier if social proof were
| portable across platforms.
| evilspammer wrote:
| ...who has ever cared about stars?
| rad_gruchalski wrote:
| ,,Our product has 5000 stars on GitHub, therefore give us
| money, it's a business opportunity". Seen plenty of those
| on LinkedIn.
| JohnFen wrote:
| > more people aren't talking about how Microsoft has a near
| monopoly on the developer ecosystem.
|
| But do they? In my day job, outside of the occasional use of
| Visual Studio and developing on a Windows machine, I use no
| Microsoft products for development.
|
| > credit to Microsoft for rehabbing their reputation with
| developers
|
| With a fair number of younger developers, but certainly not
| all. Most devs I know don't think of Microsoft any more kindly
| now than in the past.
| tcmart14 wrote:
| I don't think they have one today, but it is looking like
| they could soon have one. Especially since I believe recent
| reports show that Azure is starting out pace AWS in adoption?
| So you have .NET, Visual Studio (Code), Azure, Github,
| OpenAI, Windows, and I am pretty sure more I am forgetting
| about. I think the big one that wasn't initially mentioned
| was Azure.
| JohnFen wrote:
| All of that adds up to having a very significant, perhaps
| majority, of the market locked up. But it's also all
| centering around a particular sort of product and product
| development. Microsoft might be able to lock up that
| segment, but I don't think they're in a position to
| monopolize the larger software development space in the
| near or medium future.
| capableweb wrote:
| Don't forget TypeScript and npm as well which basically covers
| 99% of the JavaScript ecosystem if not more.
|
| Then Dependabot for large swatches of more developers outside
| of the earlier mentioned ecosystems. LinkedIn for everyone's
| career.
| dahwolf wrote:
| I would expect them to purchase StackOverflow too. I guess it's
| not that essential, seems a low cost acquisition.
| Kuinox wrote:
| Why would they purchase StackOverflow ? They are partnered
| with the StackOverflow killer: ChatGPT.
| atq2119 wrote:
| If StackOverflow dies, surely the developer-relevant
| quality of training data will suffer?
|
| After all, the capabilities of ChatGPT are basically
| proportional to how well a topic is represented in the
| training data, which is largely the internet.
| paulddraper wrote:
| Now they just need StackOverflow.
|
| They already run a .NET stack....
| rj1 wrote:
| i think it extends further than that, since they have: vscode,
| github, linkedin, npm, typescript, chatgpt. for many, this is
| almost the entire developer ecosystem.
|
| at a high level they pretend to embrace open source but many of
| the best features of vscode are closed source, such as remote
| editing and various language servers (pylance, etc.) the lsp
| saga is particularly unfriendly, since they pushed it as an
| open standard, tons of people contributed and adopted it, and
| then they closed the source to their most valuable language
| servers, making them only compatible with their product
| (vscode).
|
| there are countless similar examples. the way i see microsoft
| and the way they want to be perceived are entirely different.
| aetherane wrote:
| I don't get the praise over GitHub code search. I find it very
| inaccurate and often missing references etc. Maybe it depends on
| the language you are using it with? (Go here)
| jacobr1 wrote:
| Have you tried the new search that was just launched? It seems
| to have significantly improved search accuracy. I agree the old
| version wasn't that great (though still was one of the better
| options for finding usage of things in the wild like rarely
| used OSS dependencies I needed to debug).
| aetherane wrote:
| I was in the beta, which im assuming was the same. I find it
| sometimes misses references in the same folder as the file
| I'm look at when I do reference search.
| bdcravens wrote:
| Are you referring to what they just released? Github code
| search has always been notoriously bad. This is a new search
| product:
|
| "today, our new code search and code view are generally
| available to all users on GitHub.com"
| unicornmama wrote:
| RIP Sourcegraph
| midoBB wrote:
| In my experience Sourcegraph offers a better integration with
| Gitlab and Github both. And their code search is far superior.
| h1fra wrote:
| I have been using since the beta, truly the most impressive
| product released in the last 5 years (along with chatgpt). The
| amount of indexed code, the quickness and the precision of this
| search is simply stunning.
| jjeaff wrote:
| Could you elaborate more on why this is a significant feature?
|
| I can see how it would be handy to search a codebase online,
| especially one you don't have cloned locally, but for my own
| codebases, I can search the entire thing just fine in VS Code
| with ctrl-shft-f.
| doodlesdev wrote:
| It's not only about searching your own repository, it allows
| you to search through every single public repository on
| GitHub. I personally use it a lot to learn more obscure APIs
| which are badly documented or which I'm just not used to,
| simply search for the method I'm trying to use and find
| infinite examples of real world usage, along with the code
| license right next to it.
|
| It's also great if you drank the GitHub kool-aid as you can
| do a single search and find related code snippets, issues,
| pull requests and discussions that could possibly help. I'm
| personally not to big into the ecosystem, in fact I'm
| considering moving to Fossil so I can have everything inside
| the repo, but for those who are it's a great feature.
| h1fra wrote:
| I think others have pretty much summarised it already:
|
| Searching for non-documented or badly documented API, find
| implementation of an algorithm or specific pattern, find how
| people are using a niche tool, etc.
|
| I have even used it to find my own API in the wild to look
| for potential breaking changes and improvements to do.
| JansjoFromIkea wrote:
| For me it's been very useful at finding codebases with work
| done on extremely specific niche things that would've been
| near impossible to find otherwise (e.g. tools for obscure
| protocols hidden away on obtusely named repos)
| simonw wrote:
| Ever wanted to use an API and found the documentation to be
| lacking?
|
| GitHub code search pretty much solves that. For any API you
| can find an example of someone else using it.
|
| I've been using it for this for a year now and I wouldn't
| want to live without it.
| rco8786 wrote:
| > For any API you can find an example of someone else using
| it.
|
| Amazing. This was the light bulb moment for me. I work at
| [big tech co] and we have an internal code search tool, and
| anytime I need to use a new API I pull it up to find
| examples of how it's used.
|
| Now I can do this for the entire world of OSS, amazing.
| Arnavion wrote:
| I'm pretty sure sourcegraph.com has been doing all that (search
| all GH repos, exact search in quotes, case-sensitive search,
| regex search, limit files using filename regex) for longer than
| 5 years.
| radicality wrote:
| Is the difference that now the basic search functionality
| actually works?
|
| The previous standard GitHub search I found to be remarkably
| bad. I would be looking at some small public repo, search for
| an exact string match I know exists in the code, scope the
| search to that repo only, and still see zero results. Even
| copy-pasting a line of code from a file in the repo often
| resulted in zero matches.
| AtNightWeCode wrote:
| I don't get it too. Maybe one needs to enable something. The
| search is still useless. Locally I use git-grep.
| Arnavion wrote:
| It seems you still need to enable the feature. Click your
| user icon in the top right -> choose "Feature preview" in
| the dropdown -> enable the "New Code Search and Code View"
| feature.
| bagels wrote:
| How that is not standard for the last 10 years is
| baffling.
| Kuinox wrote:
| Indexing/searching properly such massive data is not easy
| feat.
| iudqnolq wrote:
| Indexing cost doesn't scale with number of searchers
| fallat wrote:
| Same experience here. Are there advertisement accounts
| posting here or something? Legitimately weird.
| colin353 wrote:
| Ah, you need to log in to get access to the new code
| search!
| Trasmatta wrote:
| You might want to consider replacing git-grep with ripgrep.
| csnover wrote:
| Absolutely. GitHub Code Search is by far the most valuable
| online development tool I have used the past year. It is _so_
| much more useful than Copilot or any of the AI LLMs in my
| experience.
|
| With Code Search, I have:
|
| * Rewritten a CMake build system, which would have been
| practically impossible without access to real-world examples
| because of how poorly designed and documented it is;
|
| * Validated machine-generated translations by looking up
| language strings from projects that used human translators;
|
| * Tracked down bugs in unfamiliar codebases, using symbol-based
| navigation, without the downtime of fetching a bunch of files
| and waiting for a language server to process them locally;
|
| * Reviewed how projects were using the APIs of a library I work
| on to determine whether high-maintenance features were actually
| used, and whether tricky features were being used correctly or
| needed redesigning to reduce programmer error
|
| Kudos to the team at GitHub. Genuinely stellar work.
| neuronexmachina wrote:
| That's really interesting, do you have any examples handy of
| search queries you used?
| MaxLeiter wrote:
| It's amazing for searching examples of rarely used / non-
| public APIs
| Kuinox wrote:
| I'm not the one you asked to, but here my usages:
|
| Searching how an API is used.
|
| Searching how people configured something.
|
| Searching accidental uses of an attribute:
|
| https://github.com/dotnet/csharplang/discussions/5657#discu
| s...
| tex0 wrote:
| Yes, it comes pretty close. Well done!
| rightbyte wrote:
| Seems like grep-aaS? Or am I missing something. It could not e.g.
| get the definition of a C++ function for me. Really useful still,
| but should have been there years ago.
| rav wrote:
| I have missed Google Code Search, which launched in 2006 and was
| discontinued in 2013. Similar to GitHub's code search, it
| supported searching by regex and filtering by language etc. - but
| obviously the amount of code to search through is orders of
| magnitude larger than it was 10-15 years ago. Still I wonder what
| took GitHub so long to build this - it's hardly a novel idea, and
| it seems like such an obvious power tool for programmers to have.
| SoKamil wrote:
| > Still I wonder what took GitHub so long to build this - it's
| hardly a novel idea, and it seems like such an obvious power
| tool for programmers to have.
|
| Scale. You can read more about their engineering at
| https://github.blog/2023-02-06-the-technology-behind-githubs...
| esprehn wrote:
| You could try https://sourcegraph.com/search
|
| Though GitHub will probably have all the same features
| eventually and cheaper too.
| AtNightWeCode wrote:
| How do one access the new search?
| linhns wrote:
| Should be available to all users now, previously was on beta
| AtNightWeCode wrote:
| I cannot detect any difference.
| spyremeown wrote:
| >Without code search, you might have to clone a bunch of
| repositories and grep through them
|
| Why is this an issue? I do have a clone of all my company repos,
| doesn't everyone? Memory is cheap and ripgrep/ag are fast.
| [deleted]
| sk0g wrote:
| No, every microservice or new project/ tool get its own
| repository. There's new ones being created quite frequently, so
| I have ~10 repositories cloned that most relate to the work I
| do, but nothing beyond that.
| edflsafoiewq wrote:
| Bandwidth is not cheap for many of us.
| henryfjordan wrote:
| how big are the repos at your company?
|
| Bitbucket has a 2GB max size on each repo, I'd suspect other
| providers have similar restrictions. Would be a problem on
| mobile but repos that hit that limit are rare.
| chx wrote:
| Don't get me wrong but how is this any better than say Scintilla
| searching locally? I feel like the previous one was very bad and
| this is baseline but please tell me where I am wrong. From a
| brief test this looks like something you'd have built with
| Sphinxsearch a decade plus ago?
| perpil wrote:
| The time to pull request with a good code search, paired with
| GitHub.dev and Copilot is really a force multiplier. What a time
| to be alive.
| thih9 wrote:
| I initially thought "generally available" refers to removing the
| login wall. I remember when you didn't have to sign in to search
| code on GitHub; I miss that.
| marginalia_nu wrote:
| Probably due to bot abuse. Can't have nice things on the
| internet.
|
| Requiring an account makes rate limiting vs botnets easier.
| kazinator wrote:
| It isn't generally available; it requires login.
___________________________________________________________________
(page generated 2023-05-08 23:01 UTC)