[HN Gopher] Improving GitHub Code Search
___________________________________________________________________
Improving GitHub Code Search
Author : todsacerdoti
Score : 324 points
Date : 2021-12-08 17:01 UTC (5 hours ago)
(HTM) web link (github.blog)
(TXT) w3m dump (github.blog)
| [deleted]
| zxienin wrote:
| I have 10 different git + github instances across my org. (~50k
| strong workforce, pre github repos, m&a etc). Does this cs offer
| aggregated searches across all those distributed repos?
| dstaheli wrote:
| Hi zxienin. I'm a GitHub product manager. May I assume the
| GitHub instances you're describing are GitHub Enterprise Server
| instances? We plan to bring advanced code search features to
| all GitHub plans including Enterprise Server once we've
| stabilized the UX and feature set. But it sounds like your
| situation goes beyond that, where the search needs to include
| code from Git repositories outside of GitHub Enterprise Server.
| That makes good sense, and we'll definitely consider it. If you
| want to keep in touch about it, please feel free to post in our
| feedback forum: https://github.com/github/feedback/discussions/
| categories/co.... Thank you!
| zxienin wrote:
| I shall, thnx.
|
| ps: yes, enterprise server instances
| jpgvm wrote:
| Given the shoutouts to Burntsushi and Lemire this is almost
| certainly a bitmap trigram index based engine similar to
| https://github.com/google/zoekt
|
| The index is likely based on Roaring bitmaps, presumably
| https://github.com/RoaringBitmap/roaring-rs in this case.
|
| Nice architecture, exactly how I would have done it also.
| rurban wrote:
| Nope, I would have used an existing search solution, like
| xapian. It does so much more, and much faster.
|
| You need to support a proper query syntax, with tags, rankings,
| stopwords, stemming. Then you need to have a proper db backend
| (reverse indices). Trigrams dont help for regex. Then a
| templated representation. Google codesearch would do only the
| 2nd of 3. ElasticSearch is commercial, and only java.
|
| Doing that from scratch is a bit silly.
| tanoku wrote:
| Oh OK, you have clearly spent more time thinking about this
| problem than the team of engineers at GitHub who've been
| researching code search at scale for more than four years. I
| bet they feel real silly right now knowing they could have
| shipped this search engine in a couple weeks taping together
| off-the-shelf libraries if they only had your talent for
| software architecture.
| preseinger wrote:
| Code search typically does not need many (most?) full-text
| search features like TF-IDF, stopwords, stemming, tagging,
| etc. It's a categorically different domain.
| yencabulator wrote:
| > Trigrams dont help for regex.
|
| https://swtch.com/~rsc/regexp/regexp4.html
| v1g1l4nt3 wrote:
| Related: https://srcgr.ph/zoekt-memory-optimizations
| kevinsundar wrote:
| Security researchers are gonna love this :)
|
| Time to go secrets and url hunting.
| jayflux wrote:
| You could already do this, grep.app for example has existed for
| a while. This is just bringing those features in-house.
| latenightcoding wrote:
| I always though Github's bad search functionality was a business
| decision. It was so bad for so long. Even if basic improvements
| are significantly harder at their scale, I just can't comprehend
| how Microsoft left something so potentially useful be so bad for
| so long.
| [deleted]
| v1g1l4nt3 wrote:
| Yeah, I'd bet on https://about.sourcegraph.com. Fully focused
| on code search and are still light years ahead.
| ElectronShak wrote:
| Reminds me of https://grep.app, Search across a half million git
| repos [1]
|
| 1. https://news.ycombinator.com/item?id=22396824
| tuananh wrote:
| i still dont understand why it can be so far, haha
| einpoklum wrote:
| They haven't implemented wildcard search for... well, ever:
|
| https://github.com/isaacs/github/issues/402
|
| I don't even care it's very fast. Just make it work. I just hope
| this isn't snake oil. Weird that they claim regex support but no
| wildcard support.
| heipei wrote:
| Curious if this is something completely bespoke or simply a beefy
| ElasticSearch cluster which uses the (relatively) new "wildcard"
| field for enabling regex search on select fields. The search
| syntax certainly maps 1:1 to the ElasticSearch Query String
| syntax, including phrase search, boolean operations, grouping,
| regex search, etc.
| 100k wrote:
| (I worked on this and the prior version of code search that
| uses Elasticsearch.)
|
| It is a custom search engine, built from the ground up for
| code. We'll be sharing more details about it on the GitHub blog
| soon.
| anarazel wrote:
| Some logic to exclude duplicate results would be useful. I often
| search to see how many external users there are of some API in
| postgres. But there's hundreds of separate repos with similar
| contents showing up in the search results...
| adamnemecek wrote:
| I use github search a lot and this would be an insane
| productivity boost. I signed up for the waitlist. Does anyone
| working at Github want to bump me in the queue? This is my
| profile https://github.com/adamnemecek/
| adamnemecek wrote:
| I just got access to it. I'm not sure if someone here helped
| but if yes, then thank you very much.
| v1g1l4nt3 wrote:
| You can skip the wait and use https://sourcegraph.com/search
| instead.
| jshier wrote:
| Got into the preview, can finally search for actual code! One
| thing I'd like to see, though, is the ability to mark directories
| to be ignored in the search results. No one needs to search the
| raw HTML of my generated documentation, yet it shows up in every
| search for project symbols. And since HTML is considered
| "source", I can't filter it out unless I select a particular
| language.
|
| Also the search text field is bit messed up in Safari when the
| text gets longer than the field.
| colin353 wrote:
| GitHub Code Search developer here - try creating a custom scope
| to filter out that stuff! Click on the scopes dropdown and
| scroll to the bottom. You can filter out HTML by using a query
| like:
|
| NOT language:html
| jshier wrote:
| Ah, I was trying language:!html.
|
| Would still be great to ignore my docs directory.
| esprehn wrote:
| It would be great if this used the same filter format as
| sourcegraph and other internal code search tools. ex.
| -file:.html is enough to filter away files ending in html in
| the main search box.
|
| Having to use dropdowns and multiple input fields is more
| cumbersome than the filter language of repo:, file:, lang:
| etc.
| adamnemecek wrote:
| I hope they add deduplication. I can't count the number of times
| when I get 100 pages of results where 95 pages is from the same
| included library.
| 100k wrote:
| (I worked on this.)
|
| This is on our radar! We de-duplicate exact matches now, but
| we'd like to do the same for near-similar documents.
| elliottcarlson wrote:
| De-duping exact matches is a game changed -- search has been
| miserable to use because of the dupes for so long. I can live
| with near-similar documents. Very excited to test this out.
| colin353 wrote:
| Another GitHub Code Search developer here - to add more to
| this, we rank all the search results, and try to bring the
| most relevant results to the top. Ideally, if you have 10
| pages of results, you shouldn't have to leave page 1 to
| find what you're looking for :D
| sumtechguy wrote:
| That would be a tough problem. As de-dup you probably want to
| show/point towards the 'original' tree. But which one is the
| source? Or even worse someone abandons a project but someone
| else forked it and kept going should it show that one
| instead? Or should it show the one it was forked from
| depending on the version number. Which one is the 'true' repo
| now? Most certainly an interesting problem.
| francislavoie wrote:
| It really looks like they took a lot of inspiration from
| https://sourcegraph.com/search with this. Not a bad thing at all.
| I hope SourceGraph doesn't get obsoleted by this though, they're
| great people.
| junon wrote:
| I remember seeing this years ago and thought it was a bit
| subpar but it appears they've made strides since then. I might
| start using this again.
| lancemurdock wrote:
| had a pretty awful interview experience there a while back.
| Can't say I experienced great people
| anandchowdhary wrote:
| I interviewed for Sourcegraph and it was one of the best.
| Super transparent process, open source handbook, fun coding
| tasks -- really nothing to complaint about. Would be curious
| to know what made you have such a different experience.
| sqs wrote:
| Sourcegraph CEO here. I'm really sorry about that. We work
| really hard on making our interviews good for everyone,
| including documenting it publicly at
| https://handbook.sourcegraph.com/talent/interview_process.
| Could you please email me at sqs@sourcegraph.com so I could
| find out what happened?
| mholt wrote:
| I'm surprised... I absolutely _loved_ my interview with
| Sourcegraph. I kind of wish every tech company interviewed
| like they do.
| gavinray wrote:
| I've met two of their devs randomly in different Discord
| servers. Both were great people (Noah, Olaf) and are very
| active in OSS communities. Perhaps not coincidentally, both
| worked on Language Server related stuff.
|
| Olafur is responsible for a lot of Scala tooling and some
| pretty neat original ideas.
|
| Sourcegraph also came up with LSIF, which is useful format
| for building tooling for language servers:
|
| https://lsif.dev
|
| If you want to build this sort of stuff, the work Sourcegraph
| has done with LSIF + SemanticDB is probably your easiest bet.
|
| N=2 isn't great, but there's my experiences if we're tossing
| them out there.
| sqs wrote:
| Sourcegraph CEO here. Imitation is the sincerest form of
| flattery. We are very transparent, have a ton of users, and are
| open-core, so it's easy to get inspiration from us. :) We want
| way more devs to be using code search since it's so valuable
| 10x+/day, and if this helps, then we are very happy for that.
| Devs get to choose the code search tool they use, so the best
| tool will win (you wouldn't use Bing if your boss made
| you...likewise, code search isn't like team chat or team docs).
| trinovantes wrote:
| Still waiting for the ability to search in other branches. It's a
| pain when some codebases have stable releases on the next/dev
| branch but keep their main branch to the previous release.
| namrog84 wrote:
| Absolutely. I get they don't want to index every branch but at
| least set some heuristics like it it has a certain amount of
| activity or something per repo. Or even allow repo to opt into
| 1 to 2 other branches besides main. Especially for bigger
| projects
|
| That'd cover 95% of repo I've seen.
| jkelleyrtp wrote:
| Seems to be that Rust's killer app is burntsushi's mind and
| ripgrep. :-)
| samueldr wrote:
| Only thing missing is indexing of branches and forks.
|
| My main use case for GitHub search is identifying provenance of
| misc. changes in vendor source code tarballs for e.g. Android
| kernel releases. It's hard, but sometimes possible to rehydrate
| most of the existing commits through cherry-picks and careful
| rebases.
|
| The biggest problem with the lack of indexing branches and forks
| is that sometimes vendors makes releases through branches, or
| that sometimes repos of interests are forks of e.g.
| `torvalds/linux`.
|
| Hopefully we can see those being indexed in the future.
|
| I'm also curious: has the plan to drop "less active" repos from
| the index gone through? Has anything changed?
| alufers wrote:
| > I'm also curious: has the plan to drop "less active" repos
| from the index gone through? Has anything changed?
|
| Whaaat? I hope it doesn't go through. I use GitHub code search
| for clues when reverse engineering cheap Chinese IoT crap.
| Usually I can find some headers / SDKs accidentally uploaded
| and set to public by a random Chinese guy. Those repos usually
| have one commit and zero traffic, but they contain invaluable
| information about proprietary MCUs.
| ihnorton wrote:
| I would personally like to see less indexing of duplicate
| files! There are many things I've searched for which return
| 100s of results from independent checkin-uploads of big
| libraries like the Android SDK. It would be great if results
| were filtered by file similarity regardless of git history (if
| that is in fact the issue).
| beached_whale wrote:
| Got an opportunity to try it a few minutes ago and it's awesome
| so far. I was able to look for my code in repos I don't own, e.g
| `not org:user foo::bar`
| beltsazar wrote:
| Does anyone know (or guess) what kind of index they use to
| provide regex searches? I'm really curious.
| 100k wrote:
| We'll be sharing more details soon on the GitHub blog.
| Falell wrote:
| > Search for an exact string, with support for substring matches
| and special characters, or use regular expressions (enclosed in /
| separators).
|
| Finally!
|
| Search-for-literal is so important when you have technical users
| working on non-prose text.
|
| They say this is going in a dedicated search page 'to start
| with', if "<literally any text>" doesn't work in the top bar
| eventually this is still going to be miserable.
| colin353 wrote:
| I'm from the team that developed this at GitHub - if you are in
| the technology preview, then you can jump into cs.github.com
| from searches done at the top bar.
| gavinray wrote:
| Thank you
|
| I use Github's UI for exploring and searching codebases more
| often than my own environment, since I do a lot of curious
| browsing.
|
| No offense, but the search is so bad for anything worse than
| a single word, that I've developed a sort of intuition for
| how to phrase things -- and then still spend a lot of time
| crawling pages of results haha.
|
| This was sorely needed
| colin353 wrote:
| Couldn't agree more - that's why we built it! Please give
| the new search a shot, I think you'll like it :D
| mholt wrote:
| What's your take on developing a new code search instead of
| partnering with an existing global code graph like
| Sourcegraph? What are the advantages of GitHub Code Search
| over Sourcegraph?
| edwinyzh wrote:
| Well, in the past I've tried Sourcegraph several times, but
| it never give me experiences that match the was-dead-many-
| years-ago Google Code Search. I wish the new github code
| search does that.
| zxienin wrote:
| +1 pretty much what was on my mind, seeing this. does this
| compete or complement sourcegraph?
| bsagdiyev wrote:
| Now can they fix doing a language search for "Visual Basic"? If
| you filter a users repos or stars on that language it just shows
| all their repos or stars. Code search for language "Visual Basic"
| returns all repositories and does not limit by language like it
| should.
| remram wrote:
| Meanwhile on GitLab, you can't even search in issue comments
| (only the title/description from the author).
| john_cogs wrote:
| GitLab team member here.
|
| Comment (and code) search is available for projects in all
| GitLab tiers: https://docs.gitlab.com/ee/user/search/#basic-
| search
|
| Premium and Ultimate users have access to Advanced Search:
| https://docs.gitlab.com/ee/user/search/advanced_search.html
| remram wrote:
| There is a way to search for comments using the "global
| search", but no way to search for text over issues and their
| comments. In particular, no way to search from the issue tab,
| no way to search over comments only in issues (or only in
| merge requests), no way to combine a text search with
| label/milestone/status filters, etc.
|
| So it's a workaround, but a bad one.
|
| Here's the ticket (2015): https://gitlab.com/gitlab-
| org/gitlab/-/issues/13891. The fact that it has so many
| duplicates in your own project's issue tracker is a good
| indicator of how bad your issue search is.
| boleary-gl wrote:
| GitLab team member here.
|
| > no way to combine a text search with
| label/milestone/status filters, etc.
|
| You can combine text search with field search (like
| label/milestone/status. Here's an example:
| https://gitlab.com/gitlab-
| org/gitlab/-/issues?search=Visuali...
| cosentiyes wrote:
| The addition of exact match search is so exciting that I haven't
| internalized any of the other new features. I've abandoned an
| ungodly number of semi-common-word searches after getting 30
| pages of results in a monorepo
| philsnow wrote:
| I didn't even see this in the feature list before doing the
| signup. One of the signup questions is "how do you usually
| search?" or so, I wrote in the blank "I want to search for
| symbols, not substrings, so if I'm searching for `bar` I don't
| want `foo_bar` to show up as a match". I usually do this with
| word boundaries in regexes, but I pretty much have to have the
| repo downloaded, so it's useless for searching on github.com
| this way.
| jrochkind1 wrote:
| I love how the Microsoft acquisition continues to result in
| _increased_ investment in github with microsoft 's resources, and
| real vision; not always how an acquisition goes.
| adamnemecek wrote:
| Microsoft has always been a dev tool company.
| einpoklum wrote:
| You wouldn't know it looking at MS Visual Studio though.
| NmAmDa wrote:
| I doubt that before WSL this would be something. I mean
| developing on windows was always far lot difficult than Linux
| or MacOS.
| adamnemecek wrote:
| It depends on what you were developing.
| johannes1234321 wrote:
| Win32 API isn't nice, but Microsoft was always relatively
| good with documentation etc. and don't forget all the
| developer support within Excel, VBA, Visual basic etc. Bill
| Gates early on understood the premise of building a
| platform and not breaking it. Even if that meant win32 API
| became ugly over time. Old windows programs still work on
| newer releases.
| swyx wrote:
| and a departure of all the key execs
| jrochkind1 wrote:
| what about it? That's not even a sentence.
| mintplant wrote:
| If anyone from GitHub is listening, being able to exclude test
| code with a few clicks would be an absolute game-changer. By far
| the biggest source of noise in my GH code search results, and I
| use the tool (and similar tools like Searchfox) super super
| heavily. Either way, stoked to try this out.
| halayli wrote:
| Exactly this. It will also reduce unnecessary requests on their
| servers.
| Koffiepoeder wrote:
| And inversely, searching specifically for test code can also be
| useful. For example if searching for an implementation example.
| 100k wrote:
| Thanks for the feedback! We downrank test files with a
| heuristic, though we'll definitely be looking to make this more
| sophisticated. You can also exclude results using a regular
| expression, like `foo NOT path:/_test\\.go$/`.
| dcreager wrote:
| And also note that if you often need to add this kind of
| qualifier to many searches, you can create a "custom scope"
| that includes it for you transparently.
| leaded_syrinx wrote:
| This is great, specified search on GitHub has previously been
| very hit or miss. Generally I use the search feature for learning
| / trying to see if something I'm trying to do already exists. I
| personally think vsCode has the best code search implementation,
| in terms of "exact", "partial" and "regex" matching. The UI is
| clear, non-technical team members can navigate their way around
| it and it's relatively fast assuming you don't have too many
| extraneous plugins installed.
| yashap wrote:
| Wow, HUGE feature, congrats to the team working on it! GH code
| search is a feature with such massive potential utility, but the
| old implementation was so weak it was basically useless. Looking
| forward to this, will use it constantly if it's good.
| AtNightWeCode wrote:
| Of all the tools I use on a daily basis Github is probably the
| worst. I mean the "Find a repository..." input field on the start
| page can not even filter out named repositories I have access to
| in all my organizations. It works for some repos but not all.
|
| Search improvements? It is impossible to create a worse search
| experience than Github. Just clone and use git grep instead in
| most cases.
|
| Edit: ...and the 425% price increase for SSO..
| oubliette wrote:
| Try constraining your search in Google/DDG with:
| site:github.com query
| post-it wrote:
| Could be worse, could be Reddit search.
|
| (Granted, this is largely due to a culture of titles like
| "Check out this thing" that provide zero searchable metadata +
| no tag system.)
| v1g1l4nt3 wrote:
| No need to clone if you just use
| https://sourcegraph.com/search.
| svnpenn wrote:
| Has the "Last indexed" been fixed?
|
| whenever I search for code, it will say something like "Last
| indexed on Apr 2", but if you go to the actual file, the date
| will say 5 years ago or something. So currently the "Last
| indexed" listed date is completely useless, and you have to
| basically click through to every result.
| 100k wrote:
| (I worked on that system and the new one.)
|
| Yes, sadly, that is literally when the file was _indexed_. So
| it's not particularly useful. It's a difficult problem to
| solve, but I'll bring up your feedback to the team.
| Petesta wrote:
| Glad to see GitHub's search has improved. I hope GitHub finally
| improves the search functionality on gists. You can't search your
| own gists by name.
| savanpatel wrote:
| Why does it matter to speak they built in rust in demo video? It
| should not matter to customers.
| dvirsky wrote:
| Are there any open source powerful code search engines out there?
| As a Googler the internal code search we have here is one of the
| most incredible things I've ever seen, it's so fast and powerful
| I'm amazed by it daily. Is there anything near that quality out
| there?
| dqv wrote:
| Not a Googler, so I can't say. There was Mozilla DXR but it has
| been abandoned.
| jcranmer wrote:
| DXR has largely been replaced with mozsearch
| (https://github.com/mozsearch/mozsearch), and a quick glance
| through the really early history does show that it adopted a
| fair amount of stuff from DXR. The downside is that it's not
| as easy to set up a local mozsearch instance as old-school
| DXR was.
| jcranmer wrote:
| I helped write DXR for indexing Mozilla's source code based on
| an instrumented compiler run; this has eventually been
| developed into mozsearch
| (https://github.com/mozsearch/mozsearch), whose indexing for
| mozilla-central is visible here: https://searchfox.org.
| dqv wrote:
| I thought it was abandoned! This is great to hear it just
| moved. Is there anyone at Mozilla that can update the old DXR
| repo [0] to direct people to MozSearch?
|
| [0]: https://github.com/mozilla/dxr
| jwin742 wrote:
| I work on a very large c++ monolith at work and DXR has been
| a real game changer for helping me just figure out how so
| much of the codebase works. Thanks!!
| Falell wrote:
| My job uses https://oracle.github.io/opengrok/ and I'm
| generally happy with it. It has some problems with special
| character searches at times but generally does what I want.
| It's certainly better than code search in our on-prem github
| instance.
| slaymaker1907 wrote:
| Yeah, opengrok is great. It is very fast and usually returns
| good results.
| ibraheemdev wrote:
| https://grep.app/ is a great alternative to github's current
| search engine.
| throwamon wrote:
| Would you by any chance be allowed to record a demo screencast?
| dti wrote:
| You can try it yourself, e.g., the instance the Android team
| uses: https://cs.android.com/
| dvirsky wrote:
| Oh, I didn't know this existed. The syntax seems to be on
| par with the internal one, I couldn't find any info on
| what's driving it.
| dti wrote:
| Also don't know how search works there, but the cross-
| reference functionality is powered by an open-source
| Kythe project: https://kythe.io/
| toomuchtodo wrote:
| If you don't mind me asking, any insight into why it hasn't
| been open sourced?
| dvirsky wrote:
| There is some older version that's open source, I haven't
| tried it and I don't know how much of today's code search is
| based on it.
|
| https://github.com/google/codesearch
| profquail wrote:
| Hoogle is pretty neat -- you can search by type signature and
| it'll find matching APIs from hackage packages:
| https://hoogle.haskell.org/
|
| Source: https://github.com/ndmitchell/hoogle
| beliu wrote:
| We built Sourcegraph taking inspiration from Google Code Search
| (https://about.sourcegraph.com/blog/ex-googler-guide-dev-
| tool...) to bring the power of code search--and precise code
| intelligence that just works--to every dev. Try it out here:
| https://sourcegraph.com. A super common thing we see is people
| leaving Google, missing code search, and then bringing
| Sourcegraph into their new org. We'd love to hear your
| feedback!
| beliu wrote:
| Sourcegraph is open-core, with a dual licensing approach. You
| can run the open-source version here:
| https://github.com/sourcegraph/sourcegraph#sourcegraph-oss,
| and we have an enterprise offering for companies that want to
| adopt for their teams. Similar to GitLab, both our enterprise
| and OSS code is publicly available.
| Arnavion wrote:
| The best thing about the Sourcegraph instance hosted on
| sourcegraph.com is that you can edit the URL in your browser
| from https://github.com/foo/bar to
| https://sourcegraph.com/github.com/foo/bar to be dropped down
| into a Sourcegraph search for that GH repo. I've been using
| it for a long time because of this convenience.
|
| (Though it would be even better if the two options for case-
| sensitivity and regex search were enabled by default instead
| of needing me to toggle them on every time.)
| billcaplan wrote:
| You should be able to do that over in your User Settings
| (Click your picture in the top right and then Settings.)
| Adding these two things should change that default for you:
| "search.defaultCaseSensitive": true,
| "search.defaultPatternType": "regexp",
|
| Also see:
| https://docs.sourcegraph.com/admin/config/settings#search-
| de...
| Arnavion wrote:
| I don't have a user account (nor do I want to make one).
| axiosgunnar wrote:
| Are you worried this new Github Code Search might steal all
| your users?
| murat124 wrote:
| Not sure if it's good enough to replace https://grep.app/
| 100k wrote:
| (I worked on this.)
|
| Give it a shot and let us know what you think! Where can we
| improve it?
| beltsazar wrote:
| What kind of indexes do you use to provide regex searches?
| johndough wrote:
| Today I wanted to search for "strstr[a-z]+?_r" but got the
| error message "This is a partial result set. The search was
| stopped early because it would take too long to check every
| file for this regular expression.". However, I got results
| for the less restrictive regex "strstr.+?_r" which is weird
| since I'd expect that it would be easier to return results
| for more restrictive regular expressions. Not sure if there
| is a perfect solution for this, but in many cases, you could
| probably search for the less restrictive version and filter
| the results with the more restrictive one after that.
|
| Also it would be great if more repositories were indexed. How
| do things work behind the scenes? Maybe it is possible to
| build a more memory-efficient index just for exact string
| search, which probably make up most searches.
|
| Anyway, this website is amazing and I use it quite often.
| Thank you a lot for working on this!
| 100k wrote:
| Thanks for the feedback, we're working on some changes to
| improve regular expression performance.
|
| We're also working hard to increase the number of
| repositories indexed. :)
| jayflux wrote:
| I think that app triggered the inspiration to do this. So I
| would think what they deliver will be similar or have some
| feature parity.
| deft wrote:
| I always thought the search was purposely bad and overly limited
| to prevent scraping for credentials.
| tyingq wrote:
| Ah, great. GitHub throwing out special characters in searches was
| infuriating for languages with sigils and patterns, like $somevar
| or %sql% and so on.
| oezi wrote:
| Any idea how to get further ahead on the waitlist for co-pilot?
| [deleted]
| jimsimmons wrote:
| Slight tangent: The video has a guy describing the tool and he
| includes the fact that it's written in rust when introducing it.
| I've always found this sort of name dropping in rust
| projects/devs baffling. Is there anything that I'm expected to
| infer from it? Is it that it's backend is memory safe? I can't
| think of anything else. Now it may very well be very memory safe
| but why include that highly specific detail when talking about a
| very high level thing that is the UX of search. What if it was
| written in Haskell or C#? Would it still be brought up? It's
| almost as if being written in rust is a feature in itself these
| days. As a technical guy I can't help but take the person less
| seriously, especially when it's as unwarranted as this.
| qaq wrote:
| It's obviously personal preference but as a technical guy I am
| always curious what lang. a project is using.
| nindalf wrote:
| He's talking about text search and the post thanks @BurntSushi.
| That means they're using the fastest text search tool out there
| - ripgrep. I won't mention what it's written in, because that
| clearly upsets you.
|
| Benchmark - ripgrep is faster than {grep, ag, git grep, ucg,
| pt, sift} (2016) - https://blog.burntsushi.net/ripgrep/
| t3rabytes wrote:
| Go had this issue for a while, too, it's finally started to
| calm down as Go hits a mainstream that is (imo) much farther
| than Rust is currently. I think much is just people trying to
| add validity to Rust for large-scale production workloads, in
| the same way that Kubernetes was "a compute scheduler written
| in Go" or Terraform was "infrastructure as code written in Go"
| (maybe those are bad examples, but I know I've seen the "X
| written in Go" thing going on).
| gscho wrote:
| This is exactly how I see it as well. Rust used to be an
| obscure language with a compiler written in OCAML. If
| something was written in D or zig, it's noteworthy so you
| mention it. I think rust has come into the mainstream enough
| that we can drop the "written in rust" line imo.
| eyelidlessness wrote:
| I think depending on where the audience is coming from--for
| example people who primarily work in scripting/interpreted
| languages--Rust can also be a positive signal for performance.
| colin353 wrote:
| Hey! That was me in the video.
|
| Not ashamed to be a Rust evangelist! The reason I mentioned
| Rust is because we spent a lot of time making the experience
| really fast - which is super important for a product like this.
| I really think getting the performance we have would have been
| enormously more difficult in any other language.
| Dowwie wrote:
| Fellow Rustacean here. Is the search engine secret sauce or
| something that could perhaps be open sourced? I'd like better
| tooling for searching private code bases. Also, would you
| consider writing about optimization techniques you used?
| colin353 wrote:
| We are looking into open sourcing some libraries that we've
| developed for search. And we're going to write a blog post
| with way more technical details soon!
| aaaaaaaaaaab wrote:
| It really is like the joke about vegans. So tiring.
| isaacimagine wrote:
| I agree with you, but I just wanted to point out the following:
|
| In general, Rust, C, and C++ are going to be faster than
| languages like Ruby*. He brought up Rust while discussing the
| performance of the new tool. Although performance is more
| complex than language choice, etc., saying it's written in Rust
| gives the viewer an approximate lower bound as to how fast the
| tool should be.
|
| *: (GH started as a Ruby shop, so I wouldn't be surprised if
| that's what the original tool was written in).
| ju-st wrote:
| Is there any good reason why the search doesn't find file names?
| Or does it now with the new search?
| colin353 wrote:
| The new search does find filenames! :D
| nerdkid93 wrote:
| I wonder if it is the followup to this conversation from last
| year when https://grep.app was released:
| https://news.ycombinator.com/item?id=22397728
| judge2020 wrote:
| Probably not, they've been looking at/working on improved Code
| Search since 2019: https://youtu.be/9EoNqyxtSRM?t=1726
| l0b0 wrote:
| Now, can we please get GitHub issues back into third party search
| engines? Now, whenever I search for something I _know_ is in an
| issue I only ever get results from those crappy GitHub scraper
| sites. This is happening on both Google and DuckDuckGo.
| valtism wrote:
| I don't think Github has any control over this without changing
| their content license.
| patrickdevivo wrote:
| this looks awesome! two things I've always wanted and haven't
| found satisfying solutions for in code search (in an editor)
|
| 1) an ability to easily express higher level concepts in a search
| that's aware of code semantics ("match only function names",
| "find call sites of a method") etc. Maybe this is possible with
| existing tools (probably is?) but I tend to get lazy about
| learning DSLs - would love to see this in a UI if it's possible
|
| 2) ability to save searches I do frequently - after a certain
| level of complexity in a query (I've added ignore rules, I
| crafted the right regex, etc), I want to be able to save the
| "context" of a search so that I can easily return to it later
| colin353 wrote:
| GitHub Code Search developer here:
|
| > would love to see this in a UI if it's possible
|
| We do have code navigation via the UI, so in a way it's
| possible!
|
| > ability to save searches I do frequently
|
| Absolutely! This is possible using "custom scopes". If you're
| in the technology preview, click on the scope dropdown, scroll
| to the bottom, and choose "custom scopes". You can make a
| custom scope to search a set of respositories, a particular
| language, within a directory, or any combination with boolean
| operators!
| pianoben wrote:
| I've been in the preview for a bit.
|
| 1) This doesn't seem to exist in quite that way, but you can
| prefix a literal with "def:" and the engine will return only
| definitions of that thing (so far as it can tell). It's not
| quite what you (or I!) want, but close.
|
| 2) This exists and is called "scopes". On the landing page, to
| the left of the search bar, click the grey pill that says "All
| repos". At the bottom there is a "custom scopes" option.
| colin353 wrote:
| Might also be worth checking out the syntax guide:
| https://cs.github.com/about/syntax#symbol
| jjwiseman wrote:
| It's local-only search, but you reminded me that this is
| possible with MacOS Spotlight. I wrote an indexer (for Common
| Lisp) that let you search for function definitions, etc.
|
| http://lemonodor.com/archives/001232.html
|
| For example, if you're looking for a search-and-replace
| function you know you wrote or had somewhere on your machine,
| you could do mdfind "org_lisp_defuns ==
| '*search*replace*'"
|
| (Or just use the regular Spotlight UI.)
| beached_whale wrote:
| I just want to say about time. A lot of the time when using
| libraries with inadequate documentation, being able to find
| usages of a method or class gives really good insight into the
| library. But the current code search's stemming removes all the
| context needed to find that and then gives alternate spellings
| too.
| questiondev wrote:
| i was actually really surprised that this did not exist when i
| went to search github for the first time. you would think that an
| open source giant would have this ability but i guess there is a
| ton of computational load to achieve search in general. i'll
| probably get downvoted for bringing up a whacky idea, but imagine
| having some type of referencing system that is done through multi
| node p2p, so searching certain systems using shared resources. i
| guess the major problem would be if devs would actually spare
| some of their personal computational resources to help the
| community find things and not rely on special interest groups. i
| get it, i am old school as well. i started out on pascal and
| BASIC. but still think using creative solutions is fun. but you
| know, napster was cool back in the day prior to their lawsuits.
| and p2p was starting to pick up speed
| throwamon wrote:
| There was a recent post on search engines where I believe a P2P
| solution was mentioned (but maybe it was on some related post
| within a few days of this one):
| https://news.ycombinator.com/item?id=29417061
| ryanseys wrote:
| I'd love some shorter keywords here for searching so this was
| quickly composable into something useful.
|
| E.g.
|
| p: or f: instead of path: for filenames
|
| l: instead of language:
|
| -f: to exclude specific filenames (makes it easy to filter out
| tests)
|
| You get the idea.
| oever wrote:
| Code search on GitHub is only available to people that log in
| with Microsoft. Clicking on 'Code' redirects to the login page.
|
| It is not a friendly site. Open source projects would do better
| to use an open source code forge like <https://sr.ht/>.
| v1g1l4nt3 wrote:
| Meanwhile... https://sourcegraph.com/search
| W0lf wrote:
| Great. I'm using grep.app[1] usually as for me the GitHub search
| is mostly useless. Your mileage may vary though. That being said
| there are many other great search interfaces that I am using
| often when I'm trying to find solutions to common problems or
| specific design patterns. Chromium search[2] comes to mind,
| Mozilla's Firefox[3], Android[4] or of course Google[5]
|
| [1] https://grep.app/
|
| [2] https://cs.chromium.org/
|
| [3] https://dxr.mozilla.org/mozilla-central/source/
|
| [4] https://cs.android.com/
|
| [5] https://cs.opensource.google/
| v1g1l4nt3 wrote:
| Sourcegraph[6]
|
| [6] https://about.sourcegraph.com
| majso wrote:
| This is great! As a project manager I am using github search
| everyday when I am searching for specific methods or part of the
| code in order to find logical issues or bugs in a code.
___________________________________________________________________
(page generated 2021-12-08 23:00 UTC)