[HN Gopher] 20 years of Google Scholar
       ___________________________________________________________________
        
       20 years of Google Scholar
        
       Author : thepuppet33r
       Score  : 391 points
       Date   : 2024-11-18 18:01 UTC (1 days ago)
        
 (HTM) web link (blog.google)
 (TXT) w3m dump (blog.google)
        
       | thepuppet33r wrote:
       | Yes, Google deserves to be distrusted and avoided as a whole, but
       | Google Scholar is a genuinely net good for humanity.
        
         | dumpHero2 wrote:
         | I have similar feeing for Gmail (it's effective anti spam
         | engine), google maps and google docs (which pioneered shared
         | docs. It feels outdated on many fronts now, but it was a
         | pioneer).
        
           | roflmaostc wrote:
           | anti-spam is only an issue if people dump their email
           | anywhere. I usually register my mail on webpages as
           | first.last+webpage@mail.com and once they would spam this
           | mail, it gets blacklisted.
           | 
           | I literally get only 1-3 real spam mails per month without
           | any filter.
        
             | dripton wrote:
             | Words great, until a page rejects email with a '+' in it.
        
               | 6510 wrote:
               | dots are ignored, can filter by john.doe@gmail.com
               | 
               | not sure about capital letters
        
               | hks0 wrote:
               | Not everyone's cup of tea, but quite nice if one can
               | afford it: I have my personal domain and a catch-all
               | inbox. So if I want to register at acme-co.xyz I will
               | just use acmecoxyz@my-domain.tld
               | 
               | Maybe I should start using random words though? Wonder if
               | someone will go bananas seeing their brand's name on my
               | domain.
        
               | kroltan wrote:
               | Yeah, I've had to explain that a couple times already,
               | usually when dealing with customer support or in-person
               | registrations.
               | 
               | And a "malicious" actor can get away with pretending to
               | be another company by spoofing the username if they know
               | your domain works like that. I don't think this has
               | reached spammers' repertoire yet, but I wouldn't be
               | surprised.
               | 
               | Eventually I'd like to have a way of generating random
               | email addresses that accept mail on demand, and put
               | everything else in quaraintine automatically.
        
               | AshamedCaptain wrote:
               | Or just knows about this Gmail trick (it's been 20 years
               | already) and sends spam to your real mailbox.
               | 
               | Actually, I am surprised _any_ spammy website these days
               | would even honor the part after the +, and not just
               | directly send to the real mailbox name.
        
               | thechao wrote:
               | I used to require a "+..." on all emails. Any email that
               | didn't have the "+..." was sent to Spam automagically. My
               | family were whitelisted. I gave up, because too many
               | websites (early on) refused to take the "+..." marker, so
               | I ended up losing too much to Spam. It's easier to just
               | let Google sort it out.
        
               | gnopgnip wrote:
               | It's part of RFC 5233 Sieve Email Filtering: Subaddress
               | Extension
        
               | aorth wrote:
               | Good resource on this trick from 2010. It's not Gmail
               | specific.
               | 
               | https://people.cs.rutgers.edu/~watrous/plus-signs-in-
               | email-a...
        
             | janalsncm wrote:
             | I see this recommendation everywhere and I am genuinely
             | surprised that it works. Any spammer can find out your real
             | address since there is an obvious mapping from + addresses
             | to your real address. An actual solution would hide this
             | mapping.
        
               | bachmeier wrote:
               | Yeah. Fastmail masked addresses are random. The best you
               | can do is guess that an address might be masked, due to
               | it not being johnsmith@fastmail.com, but it provides no
               | information about your real email address.
        
             | JW_00000 wrote:
             | Too late for most people.
        
           | coderintherye wrote:
           | Good for users of Gmail, but is it a net good? Gmail spam
           | prevention is great for the Google Apps orgs I manage.
           | However, for the other inboxes the vast majority of spam they
           | receive comes from @gmail.com
        
             | thaumasiotes wrote:
             | > Gmail spam prevention is great for the Google Apps orgs I
             | manage.
             | 
             | Gmail is unlikely to let spam through.
             | 
             | But that doesn't make its spam filter great; it's also very
             | prone to blocking personal communication on the grounds
             | that it must actually have been spam. The principle of
             | gmail's spam filter is just "don't let anything through".
             | 
             | It would be much better to get more spam and also not have
             | my actual communications disappear.
        
           | whiplash451 wrote:
           | Try MS OneDrive before calling google docs outdated
           | 
           | Google spanks everyone else on robustness and responsiveness
        
             | rty32 wrote:
             | Yes until it fails
             | 
             | https://www.theverge.com/2023/11/27/23978591/google-drive-
             | de...
        
               | whiplash451 wrote:
               | That issue got resolved in a few days [1] -- and for each
               | and every one of these extremely rare events at Google,
               | you'll find similar ones at MS.
               | 
               | I am referring to robustness at scale and every day:
               | Google released auto-save years before MS. MS pales in
               | comparison in the UX.
               | 
               | Note: I have no vested interest in Google, not ex-
               | googler, etc.
               | 
               | [1]
               | https://support.google.com/drive/thread/245861992/drive-
               | for-...
        
             | Fogest wrote:
             | As much as I try to "de-google" myself and try to avoid
             | being trapped in the Google eco system, I'd definitely
             | choose it over MS Office. I am stuck in the MS Office eco
             | system at work. Some of their products are starting to
             | improve in MS Office, but you can still tell it's a lot of
             | hacks ontop of old systems. Especially when it comes to the
             | whole teams/onedrive/sharepoint side of things.
             | 
             | One of my biggest gripes right now is that we heavily rely
             | on Microsoft Teams. A lot of our work laptops still are
             | stuck on 8gb of ram. I find Microsoft Teams can easily suck
             | back a full gig or more or ram, especially when in a video
             | call. From my understanding, Teams is running essentially
             | like an Electron app (except using an Edge browser
             | packaged).
             | 
             | I have no problem with web based apps, but man, some
             | optimization is called for.
        
               | nextos wrote:
               | I use a decade-old NUC with plenty of RAM as a daily
               | driver. It doesn't struggle with anything except MS
               | Teams. It can churn through Zoom or Meet calls while
               | compiling code. Teams is a bloated mess that makes the
               | fans spin at max RPM.
               | 
               | It's crazy I can boot a kernel, with an entire graphics
               | and network stack, X and a terminal in less than 200 MB
               | but then the Teams webapp uses a massive amount of
               | resources and grinds everything else to a halt.
               | 
               | Word 365 also becomes incredibly laggy on long documents
               | with tons of comments, whereas Google Docs is just fine.
               | But, apparently, this is also a thing on modern hardware.
               | I guess these days Microsoft has little attention to
               | detail.
        
               | Fogest wrote:
               | It's funny because sometimes Teams uses more resources
               | than the Edge browser. Despite Teams being Edge based for
               | their application.
               | 
               | I think overall many companies have gotten lazy/sloppy
               | when it comes to optimization. Game dev is even worse for
               | this. I like how Microsoft products integrate with each
               | other, but often the whole thing feels sloppy and
               | unoptimized.
        
           | globular-toast wrote:
           | Google maps would only be a net good if the data was
           | available under a free licence. As it is they take data from
           | people that should have gone to a public project like
           | OpenStreetMap.
        
             | arccy wrote:
             | "take", these people would never have produced any data if
             | gmaps wasn't there...
        
               | hatthew wrote:
               | At one point I contributed quite a bit to google maps,
               | because it was the primary map system I was using at the
               | time. Had I been using an OSM-based system, I would have
               | made contributions there instead.
        
               | arccy wrote:
               | indeed, osm can't paint itself like a victim, it needs
               | good end products to bring in contributors.
        
             | wbl wrote:
             | I ran into trouble because Open Topo does not report a
             | stream the 7.5" series does. There's serious data quality
             | issues that can make it not work for some applications.
        
           | gray_-_wolf wrote:
           | Most of the spam I get is _from_ gmail. Maybe they should
           | apply their so effective spam engine to outgoing mail as
           | well...
        
             | crazygringo wrote:
             | It's probably not. You can put any domain you want on the
             | "from" address. Just because it says it was from Gmail
             | doesn't mean it actually was, unless it's signed with DKIM
             | etc.
             | 
             | I had a domain for a while that people got spam "from" all
             | the time. It had nothing to do with me and there was
             | nothing I could do about it.
        
               | dpifke wrote:
               | I run mail servers for myself, a couple of side projects,
               | and some friends and family. A double-digit percentage of
               | all spam caught by my filters is from Google's mail
               | servers, not just forged @gmail.com addresses.
               | 
               | Of the "too big to block outright" spam senders, behind
               | Twilio Sendgrid and Weebly, Google is currently #3.
               | Amazon is a close #4. None of the top four currently have
               | useful abuse reporting mechanisms... Sendgrid used to be
               | OK, but they no longer seem to take any action. Google
               | doesn't even accept abuse reports, which is ironic
               | because "does not accept or act upon abuse reports" is
               | criteria for being blocked by Google.
               | 
               | Most spam from Google is fake invoices and 419 scams.
               | This is trivially filtered on my end, which makes it
               | perplexing Google doesn't choose to do so. I can
               | guarantee that exactly 0% of Gmail users sending out
               | renewal invoices for "N0rton Anti-Virus" are legitimate.
        
               | gray_-_wolf wrote:
               | I would hope google has DKIM and SPF set.
        
               | csomar wrote:
               | The Spam I get from "gmail" and ends up in my spam folder
               | is spoofed. The Spam I get from gmail and ends up in my
               | inbox _is_ from gmail. Spammers will mass-create accounts
               | and mass-sell them to spammers.
        
           | AlienRobot wrote:
           | "Google is evil, except for all the Google products Google
           | produced"
           | 
           | Honestly, if we compare Google to Amazon, Microsoft, Apple,
           | and Meta, isn't Google the least evil one?
        
             | insane_dreamer wrote:
             | No, I'd put them in this descending order of evilness:
             | Meta, Amazon, Microsoft, Google, Apple.
        
           | asdff wrote:
           | Do people use shared docs often in the workplace? I only used
           | it on like two group projects in school and it probably made
           | things more clunky than if we just wrote our portions and
           | compiled them after. Maybe it works for some workflows but
           | having multiple people editing the same document is chaotic,
           | unless you delegate who does what, at which point there's no
           | point in having it be a shared doc when the responsibilities
           | are delegated.
        
             | mwest217 wrote:
             | All the time, it is incredibly useful to send a doc for
             | comments, which can be attached to the relevant piece of
             | text. I use shared editing less often, but I find it's
             | especially useful in incident response where there may be
             | multiple investigation workstreams, and the incident
             | commander needs to be able to see all of them.
        
           | guappa wrote:
           | Nah google maps shows/hides things with very obscure logic.
           | 
           | Like you can ask to find a restaurant and it won't point you
           | to the closer one but to one that is few km away instead.
        
         | insane_dreamer wrote:
         | Google Maps is a net positive as well
        
           | robertlagrant wrote:
           | Some more:
           | 
           | - Google Search
           | 
           | - YouTube (more debateable, but I think it's a marvel)
           | 
           | - Google Books
           | 
           | - ChromeBooks
           | 
           | - Android
           | 
           | - Google Calendar
           | 
           | - Google Earth
           | 
           | - Google Drive
           | 
           | - Google Docs
           | 
           | - Waze
           | 
           | - Android Auto
           | 
           | - Google Pay
           | 
           | - Kubernetes
           | 
           | - Go
           | 
           | - VP8 / VP9
           | 
           | I'd rather take all those products than leave them.
        
             | insane_dreamer wrote:
             | ok, but Search aside (which is Google's primary product; we
             | were talking about side project), many of these are also-
             | rans; they didn't really change the landscape the way
             | Google Maps (and of course Search). OK, maybe Android, but
             | that wasn't developed by Google. Neither was YouTube
             | (groundbreaking), or Waze (not groundbreaking).
             | 
             | The only one I would take from your list would be
             | Kubernetes and Google Earth, and Kubernetes being more of a
             | dev tool would really count as far as impact and usefulness
             | to society (Go would fit there).
             | 
             | Google Books _could_ have been great, but Google didn't
             | take care of it. Same with Google Reader.
        
               | hfsh wrote:
               | >which is Google's primary product
               | 
               | Which _used_ to be Google 's primary product, way waaaay
               | back when. Their primary product now is advertising, and
               | has been for a very long time.
        
         | codeflo wrote:
         | I'll reserve judgement on its net effect until the moment they
         | kill it.
        
       | renewiltord wrote:
       | Google Scholar is fantastic stuff. I am so grateful for it. It's
       | crazy how easy it is to find papers these days by just going to
       | it. University library search functions are completely useless in
       | comparison.
        
       | elashri wrote:
       | I did not know about PDF Scholar Readee extension [1].
       | Unfortunately the reason is that I use Firefox only (and safari
       | iOS) and it is not available there. The AI outlines will be
       | useful and I can think of myself using it.
       | 
       | I do not want to comment on number 20. I really wished that I
       | joined CERN 10 years earlier but then it is the mistake of my
       | parents :)
       | 
       | [1] https://chromewebstore.google.com/detail/google-scholar-
       | pdf-...
        
       | dctoedt wrote:
       | I'd not known about "F.D.C. Willard" -- the _nom de plume_ of a
       | Michigan State physics professor 's Siamese cat, Chester -- who
       | was listed as a co-author of a number of the professor's physics
       | papers.
       | 
       | More on Chester and his co-author status:
       | https://en.wikipedia.org/wiki/F._D._C._Willard
        
       | mananaysiempre wrote:
       | 21. Google Scholar will deny access to you if you (need to) self-
       | host a VPN on a common VPS provider. Being a Google product, it
       | also can't be special-cased in your routing table. (I genuinely
       | had to retrain myself to use Google Scholar again once I no
       | longer had that need.)
       | 
       | 22. Switching on sort by date will impose a filter to papers
       | published within the year, and you cannot do anything about that.
        
         | eesmith wrote:
         | > 22. Switching on sort by date will impose a filter to papers
         | published within the year, and you cannot do anything about
         | that.
         | 
         | !!! And here I thought it's been broken for years, and a sign
         | of decay due to lack of internal support.
        
           | buildbot wrote:
           | I swear this was working for me until literally today, it was
           | really useful to find older ML papers?!
        
             | mananaysiempre wrote:
             | There is _filter_ by date and _sort_ by date. The former
             | works. The latter, when enabled, even adds a banner on top
             | of the page (in large but gray type) that says "Articles
             | added in the last year, sorted by date", and resets any
             | filter you might have set before.
        
               | MichaelZuo wrote:
               | Was this change ever logged or noted some way? Or did it
               | just show up one day?
        
               | philipkglass wrote:
               | If it ever returned time-sorted results without limit,
               | that was long in the past. It has truncated results to
               | one year for the last several years I have used Scholar.
        
               | crazygringo wrote:
               | It seems so intentionally "broken", I can only guess it
               | is to prevent scraping? Since searching for generic-ish
               | search terms and sorting by date is a common scraping
               | strategy.
               | 
               | Still, you'd think they'd do a cutoff of e.g. 500 or
               | 1,000 items rather than filter by the past year.
               | 
               | So I can't help but wonder if it's a contractual
               | limitation insisted on by publishers? Since the
               | publishers also don't want all their papers being
               | spidered via Scholar? It feels kind of like a limitation
               | a lawyer came up with.
        
               | eesmith wrote:
               | Unlikely, since the easy work-around for scrapers is to
               | search by date range and grab things that way. That's
               | what I do now manually.
        
               | asdff wrote:
               | pubmed is literally built for academic scraping. It even
               | has a command line interface to access it. If publishers
               | were worried about scraping they'd target that, but they
               | don't. In fact when papers go on pubmed after a year they
               | are rehosted by pubmed central and made freely available
               | to anyone in the world.
        
       | svat wrote:
       | Related: 2014 article by Steven Levy, titled "The Gentleman Who
       | Made Scholar": https://www.wired.com/2014/10/the-gentleman-who-
       | made-scholar...
        
         | Thrymr wrote:
         | > Would he want to continue working on Scholar for another ten
         | years? "One always believes there are other opportunities, but
         | the problem is how to pursue them when you are in a place you
         | like and you have been doing really well. I can do problems
         | that seem very interesting me -- but the biggest impact I can
         | possible make is helping people who are solving the world's
         | problems to be more efficient. If I can make the world's
         | researchers ten percent more efficient, consider the cumulative
         | impact of that. So if I ended up spending the next ten years
         | going this, I think I would be extremely happy."
         | 
         | Has he still been working on it in the 10 years since this
         | article? His name is in the byline of the new blog post, but
         | it's not clear from that how much he's been working on it.
        
           | the-rc wrote:
           | 12-13 years ago, I ran the system that inlined Scholar and
           | other results on the main search result pages. Anurag was
           | still involved, but AFAIR Alex, the other author of the post
           | who also had been there from the start, worked on most code
           | changes. I would guess that things are more or less the same
           | today. (Because it had such limited headcount, Scholar was
           | known to lag behind other services when it came to
           | code/infrastructure migrations.)
        
             | jll29 wrote:
             | Thanks for that inside scoop, even if it's a bit dated; I
             | wonder if they read this discussion, perhaps.
             | 
             | An important feature request would be a view where only
             | peer-reviewed publications (specifically, not ArXiv and
             | other pre-print archives) are included in the citation
             | counts, and self-citations are also excluded.
             | 
             | A way to download all citation sources would also be a
             | great nice-to-have.
        
       | zeroonetwothree wrote:
       | Google Scholar is so good. I started doing research right when it
       | came out and it was amazingly helpful. I can't imagine how it was
       | done before.
        
         | IshKebab wrote:
         | There are alternatives, like Web of Knowledge. You basically
         | need to be in a Uni for that though.
        
         | leephillips wrote:
         | I would go to the library and pull volumes of _Science Citation
         | Index_ off the shelves. Yes, Google Scholar was a revolution.
        
         | dekhn wrote:
         | I'd go to the card catalog (index), turn my question into a bag
         | of words (tokenize), fetch all the cards matching each token
         | (posting lists), drop cards which didn't include enough of the
         | tokens (posting list intersection), ordering the cards by the
         | number of tokens they matched (keyword match ranking), filter
         | at some cutoff, and then reorder based on the h-index of the
         | author (page rank). Then I would read each paper in order,
         | following citations in a breadth-first manner.
         | 
         | (the above is a joke comparing old school library work to
         | search engines circa 2000; I didn't actually do all those
         | steps. I'd usually just find the most recent review article and
         | read the papers it cited).
        
         | asdff wrote:
         | I had an old boss who did it in the old analog way. He had a
         | secretary handle his email and transcribing stuff he hand
         | wrote. He had print subscriptions to a couple nature journals,
         | science, and a couple research niche specific journals and he
         | read them basically cover to cover. He'd attend conferences and
         | had many collaborators who would send him papers from their own
         | lab to opine on.
         | 
         | I actually respect this style a lot. There is a firehose of
         | papers coming onto google scholar each day. You type in some
         | keyword you get 500 hits. This cut that down substantially for
         | him in a way where he never missed anything big (reading nature
         | and science), kept up with what the field has been doing
         | (reading the more niche specific journals and keeping up with
         | the labs who put out this niche work), and seeing what was
         | coming up in the pipeline from the conferences or what sort of
         | research new grants were requesting. I'm not sure that scholar
         | would have helped much.
        
       | kylebenzle wrote:
       | I was hoping it would be 20 tips and tricks on how to use the
       | service better not random fun facts about its history :-(
        
       | chromatin wrote:
       | 21. No API
        
       | malshe wrote:
       | I use Google Scholar daily and it's been a fantastic resource.
       | Google Scholar with Zotero completes my articles search and
       | storage.
       | 
       | Btw, Anurag's last name is misspelt under the picture. It reads
       | "Achurya" instead of "Acharya"
       | 
       | Edit: They fixed it
        
       | lbeckman314 wrote:
       | > 18. A paw-sitive contribution to Physics. F.D.C Willard
       | (otherwise known as Chester, the Siamese cat) is listed as a co-
       | author on an article entitled: "Two, Three, and Four-Atom
       | Exchange Effects" that explores the magnetic properties of solid
       | helium-3 and how interactions between its atoms influence its
       | behavior at extremely low temperatures. Chester's starring role
       | came about because his co-author/owner, Jack H. Hetherington
       | wrote the entire paper with the plural "we" instead of a single
       | "I."
       | 
       | ---
       | 
       | 'Two-, Three-, and Four-Atom Exchange Effects in bcc 3He' by J.
       | H. Hetherington and F. D. C. Willard [0, 1, 2]
       | 
       | [0]
       | https://xkeys.com/media/wysiwyg/smartwave/porto/category/abo...
       | 
       | [1] https://xkeys.com/about/jackspages/fdcwillard.html
       | 
       | [2] https://en.wikipedia.org/wiki/F._D._C._Willard
        
         | lr1970 wrote:
         | Sir Andre Geim [0], the only person in the world who received
         | both the real Nobel prize in Physics and the Ig Nobel prize co-
         | authored one of his articles [1] with his hamster Tisha.
         | 
         | [0] https://en.wikipedia.org/wiki/Andre_Geim
         | 
         | [1]
         | https://repository.ubn.ru.nl//bitstream/handle/2066/249681/2...
        
       | russellbeattie wrote:
       | Huh. I tried the "Listen to article" button, because I knew it
       | was going to be generated and was curious to hear how it sounded.
       | 
       | Interestingly, it highlighted the words as it read. I haven't
       | seen that before online. Not sure how useful it is (especially
       | for anyone interested in this particular topic), but I thought it
       | was a neat innovation nevertheless.
        
       | gexaha wrote:
       | The most fun fact is that it still exists!
        
       | robwwilliams wrote:
       | Our department uses GScholar as a great research-focused CV
       | generator. Not used formally except that faculty pages have a
       | link to their GS pages.
        
       | wseqyrku wrote:
       | For a second I thought this was buzzfeed for some reason.
        
       | GeoAtreides wrote:
       | oh no
       | 
       | they remembered google scholar exists
       | 
       | it's a great product and I don't trust google at all not to break
       | it or mess with it
        
         | crazygringo wrote:
         | Google employs a lot of people from academia. Scholar is used
         | and loved by a _lot_ of people _within_ Google. It 's been
         | around for two decades. I really don't think it's going
         | anywhere.
        
           | dekhn wrote:
           | Reader was used and loved by a LOT of people WITHIN google,
           | but it was shut down (and the leadership that loved it even
           | made arguments in front of the company why it "had to be shut
           | down").
           | 
           | AFAICT Scholar remains because Anurag built up massive cred
           | in the early years (he was a critically important search
           | engineer) with Larry Page and kept his infra costs and
           | headcount really small, while also taking advantage of search
           | infra).
        
             | crazygringo wrote:
             | If it matters, they cited declining usage of Reader as a
             | reason for shutting it down.
             | 
             | It seems like Scholar has an overall upward trend, although
             | their methodology notes make it hard to compare some
             | periods directly:
             | 
             | https://trends.google.com/trends/explore?date=all&q=%2Fm%2F
             | 0...
             | 
             | I'm basically assuming this is the rate of growth of
             | graduate school, and no competing products have had any
             | real effect?
        
               | dekhn wrote:
               | Reader usage was declining because the application was
               | not being developed. The other reason they mentioned is
               | that it would have required a lot of work to rewrite the
               | app to be consistent with the new user data policies
               | being put into place.
        
       | afandian wrote:
       | Some fun Google Scholar history from another perspective.
       | 
       | https://youtu.be/DZ2Bgwyx3nU?t=315
       | 
       | I recommend you watch the rest of the video, on the subject of
       | open/closed and enclosure of infrastructure.
        
       | teruakohatu wrote:
       | The best thing, by a long way, that Google Scholar has achieved
       | is denying Elsevier & co a monopoly on academic search.
       | 
       | In most universities here in New Zealand, articles have to be
       | published in a journal indexed by Elsevier's Scopus. Not in a
       | Scopus-indexed journal, it does not count anymore than a reddit
       | comment. This gives Elsevier tremendous power. But in CS/ML/AI
       | most academics and students turn to Google Scholar first when
       | doing searches.
        
         | freefaler wrote:
         | or turn to sci-hub and annas-arhive :)
        
           | philipkglass wrote:
           | You use Google Scholar to find papers you're interested in,
           | then use sci-hub to actually read them.
        
             | freefaler wrote:
             | indeed... and use Zotero with the correct plugin to
             | download them automagically
        
               | epcoa wrote:
               | sci-hub hasn't been updated in 4 years and the sources
               | for annas-archive like nexus-stc are seriously hit or
               | miss (depends on the field).
        
               | freefaler wrote:
               | Nothing lasts forever, but the model of buying a paper
               | for 40$ from Elsevier isn't much better. Depending on the
               | field there are other sources, but still a hit rate is
               | about 85-90%.
        
               | mateus1 wrote:
               | Any alternatives?
        
             | consf wrote:
             | I remember that when I no longer had access to university
             | subscriptions, Sci-Hub was my salvation during those times
             | when I had no money of my own
        
             | orochimaaru wrote:
             | Not sure if researchgate is still a thing. I had it and
             | uploaded all my papers there. They show up automatically on
             | Google. I believe this is allowed since you're allowed to
             | share copies of your publication on your website.
             | 
             | The problem is my researchgate account was connected to my
             | academic account. It's been a while since I graduated so
             | I've lost access to my own publications and page.
             | 
             | But I used to use researchgate and requests in researchgate
             | quite a bit.
        
           | teruakohatu wrote:
           | Does sci-hub have up to date content these days?
           | 
           | Having pretty wide journal access through my institution
           | means I don't need to reach out to sci-hub.
        
             | epcoa wrote:
             | sci-hub proper hasn't been updated since it's indefinite
             | pause in december 2020. Alternatives are of variable
             | success depending on field. It might be better for CS/Math,
             | but medicine and life sciences it's pretty bad.
        
               | whimsicalism wrote:
               | i believe they paused due to an indian court injunction
               | and the case was heard this year, does anyone know any
               | update?
        
               | insane_dreamer wrote:
               | How would an Indian court case have any jurisdiction in
               | Russia (not to mention mirrors)?
        
               | cipheredStones wrote:
               | Sci-Hub complied with the order with the intent to
               | actually argue their case (and possibly establish a legal
               | justification for the site), rather than just defying the
               | order and continuing to play cat-and-mouse with every
               | authority.
        
               | joshuaissac wrote:
               | And this is because they have a chance of winning. The
               | same court has previously adopted a broad interpretation
               | of what constitutes fair dealing.
               | 
               | https://en.wikipedia.org/wiki/University_of_Oxford_v._Ram
               | esh...
        
           | whimsicalism wrote:
           | scihub is dying unfortunately :( the good news is it is
           | happening just as all the fields i'm interested in except for
           | some experimental physics & biology have moved to OA
        
             | Loughla wrote:
             | oa resources have really kicked it into high gear post
             | covid. They used to be kind of a joke, but they're actually
             | competitive now. It's nice to see.
        
               | Onawa wrote:
               | I believe NIH's directive that all intramural and
               | extramural research must be published OA has helped move
               | things in that direction quite a lot.
        
             | kedarkhand wrote:
             | Sorry but what is OA?
        
               | bloak wrote:
               | https://en.wikipedia.org/wiki/Open_access (I assume)
        
           | thrdbndndn wrote:
           | I'm a proud user of sci-hub but when I was still in
           | academics, I have never used it. My school has access to all
           | the journals I ever needed, plus more old non-digitized ones
           | I can borrow from library (including interlibrary access).
        
             | thrw42A8N wrote:
             | My school has no such thing and yet requires me to find and
             | cite research.
        
               | consf wrote:
               | I think access to research shouldn't be a luxury or
               | dependent on where you study
        
               | slashtab wrote:
               | Reminds me of Aaron Swartz.
        
               | Melatonic wrote:
               | The legend
        
             | consf wrote:
             | For those outside such institutions, tools like Sci-Hub
             | often become a lifeline (as it was for me)
        
             | ryzvonusef wrote:
             | It depends on the discipline, also the mode of learning
             | (I'm distance learning so no physical library access).
             | 
             | My uni (Northampton) has access to a LOT of journals... but
             | has a blindspot in management, specifically accountancy
             | focus journals; am doing my lit review for my MSc
             | dissertation and the number of times I hit a dead end is
             | frustrating.
             | 
             | Sci-hub and Annas-Archive are also not interested in that
             | segment, so double whammy.
             | 
             | But surprisingly Archive.org was able to help me out a bit,
             | so thanks for that.
        
             | Suppafly wrote:
             | >My school has access to all the journals I ever needed
             | 
             | I miss being on a university network and having paywalled
             | journals and such just magically load.
        
         | p4bl0 wrote:
         | Yet it still participates and encourages the bibliometrics
         | game, which benefits the big publishers.
         | 
         | A simple way to make a step away from encouraging bibliometrics
         | (which would be a step in the right direction) would be to list
         | publications by date (most recent first) on authors pages
         | rather than by citations count, or at least to let either users
         | and/or authors choose the default sorting they want to use
         | (when visiting a page for users, for their page by default for
         | authors).
        
           | Scriddie wrote:
           | this^10
        
           | SideQuark wrote:
           | > the bibliometrics game
           | 
           | Bibliometrics, in use for over 150 years now, is not a game.
           | That's like arguing there is no value in the PageRank
           | algorithm, and no validity to trying to find out which
           | journals or researchers or research teams publish better
           | content using evidence to do so.
           | 
           | > which benefits the big publishers
           | 
           | Ignoring that it helps small researchers seems short sighted.
           | 
           | > A simple way to make a step ... would be to list
           | publications by date
           | 
           | It's really that hard to click "year" and have that sorted?
           | 
           | It's almost a certainty when someone is looking for a
           | scholar, they are looking for more highly cited work than
           | not, so the default is probably the best use of reader times.
           | I absolutely know when I look up an author, I am interested
           | in what other work they did that is highly regarded more than
           | any other factor. Once in a while I look to see what they did
           | recently, which is exactly one click away.
        
             | mindcrime wrote:
             | To be fair, you did hedge and say "almost a certainty" and
             | maybe that's true. But speaking for myself, I generally
             | couldn't care less about citation count. If anything, my
             | interest in a document may be inversely proportional to the
             | citation count. And that's because I'm often looking for
             | either a. "lost gems" - things are are actually
             | great/useful research, but that got overlooked for whatever
             | reason, or b. historical references to obscure topics that
             | I'm deep-diving into.
             | 
             | BUT... I'm not in formal academia, I care very little about
             | publishing research myself (at least not from a
             | bibliometric perspective. For me "publishing" might be
             | writing a blog post or maybe submitting a pre-print
             | somewhere) so I'm just not part of that whole
             | (racket|game|whatever-you-want-to-call-it).
        
       | jrochkind1 wrote:
       | > 1. The team started with just two of us.
       | 
       | My guess for a while has been that it was back to two of them! if
       | that!
        
       | p4bl0 wrote:
       | I wish GScholar wouldn't embrace bibliometrics so much. Sort
       | papers by date (most recent papers first) by default on an
       | author's page rather than by citation count, or at least give
       | author the choice to individually opt-in to sort by date by
       | default.
        
       | random3 wrote:
       | Fun fact about Google Scholar: it's "free", but it's just another
       | soulless Google product - no clear strategy, no support, and a
       | fragile proprietary dependency in what should be an open
       | ecosystem. This creates inherent risks for the academic
       | community. We need the equivalent of arXiv for Google Scholar
        
         | afandian wrote:
         | The Invest in Open site has a good directory of open tools.
         | 
         | https://infrafinder.investinopen.org/solutions
        
         | kergonath wrote:
         | Yes. On one hand I'd like Google to improve things a bit. There
         | are some rough edges, which is a shame because it indexes some
         | things that are not in Scopus or Web of Knowledge, like theses
         | and preprint repositories. On the other hand I worry that some
         | manager somewhere would kill it if they realised that it is
         | still around.
        
           | random3 wrote:
           | Every 1-2 months when Chrome updates I get banned by their
           | throttling mechanism because I their extension makes too many
           | requests and they see "unusual traffic"
           | 
           | It can take 1-2 weeks to go away and be able to use it.
           | There's no way to get in contact with anyone. Tried the
           | Chrome extension email, support forums.
           | 
           | It's a good reality check. There's no real support behind it
           | and it can go away just like Google Reader did.
           | 
           | I think the motivations behind it are laudable, but they
           | should not be the answer to the actual problem.
        
           | griomnib wrote:
           | I'm fairly sure they only exist because Larry/Sergei might
           | give half a fuck if they killed it outright, and it has a
           | small enough team that the cost savings for killing aren't
           | enough for Ruth to want to make that argument.
        
         | sitkack wrote:
         | And that is semantic scholar, https://www.semanticscholar.org/
        
           | mapmeld wrote:
           | For people unfamiliar, Semantic Scholar is run by the Allen
           | Institute and has been researching accurate AI summarization
           | and semantic search for years. Also they have support for
           | author name changes.
        
             | crazygringo wrote:
             | How does it compare with Google Scholar?
             | 
             | It advertises itself as "from all fields of science" --
             | does that includes fields like economics? Sociology?
             | Political science? What about law journals? In other words,
             | is the coverage as broad? And if it doesn't include certain
             | fields, where is the "science" line drawn?
             | 
             | And I'm curious if people find it to be as useful (or more)
             | just in terms of UX, features, etc.
        
               | Onawa wrote:
               | Semantic Scholar's search is pretty good, but there are
               | also a variety of other (paid) projects that expand on
               | its API. Look at tools like Scite and LitMaps for what's
               | possible with the semantic scholar dataset.
               | 
               | As for coverage, I think it focuses more on the life
               | sciences, but I'm not positive about that.
        
               | ninjin wrote:
               | They are substantially smaller in coverage, but have
               | higher quality in my experience. Remarkably, they are
               | also willing to correct their data if you notify them.
               | This of course in is stark contrast to Google Scholar
               | where the metadata of papers is frequently _wildly_
               | inaccurate. On top of this, Semantic Scholar shares their
               | underlying data (although you need to request an API
               | key). Overall, they have been growing slowly and steadily
               | over the years and I have a lot of respect for what their
               | team is doing for researchers such as myself.
               | 
               | Now for the less great.
               | 
               | They are pushing the concept of "Highly Influential
               | Citations" [1] as their _default_ metric, which to the
               | best of my knowledge is based on a _singular_ workshop
               | publication that produced a classifier trained on about
               | 500 training samples to classify citations. I am a very
               | harsh critic of any metrics for scientific impact. But
               | this is just utter madness. Guaranteeing that this metric
               | is not grossly misleading is nearly impossible and it
               | feels like the only reason they picked it is because
               | Etzioni (AI2 head) is the last author of the workshop
               | paper. It should have been _at best_ a novelty metric and
               | certainly not the default one.
               | 
               | [1]: https://webflow.semanticscholar.org/faq/influential-
               | citation...
               | 
               | Recently, they introduced their Semantic Reader
               | functionality and are now pushing it as a default way to
               | access PDFs on the website. Forcing you to click on a
               | drop down to access plain PDFs. It may or may not be a
               | great tool, but it feels somewhat obvious that they are
               | attempting to use shady patterns to push you in the
               | direction they want.
               | 
               | Lastly, they have started using Google Analytics. Which
               | is not great, but I can understand why they go for the
               | industry default.
               | 
               | Overall, I use them nearly daily and they are the best
               | offering out there for my area of research. Although, I
               | at times feel tempted to grab the data and create an
               | alternative (simpler) frontend with fewer distractions
               | and "modern" web nonsense.
        
               | crazygringo wrote:
               | Thank you so much!
        
           | bugglebeetle wrote:
           | OpenAlex is a really good here too, including their API.
           | They're also the inheritors of the Microsoft Academic Graph,
           | fully open source and open data:
           | 
           | https://openalex.org
        
           | valusson wrote:
           | It's nice, but OpenAlex is better.
           | https://explore.openalex.org/ It also has a free API and
           | people have built python libraries to access it.
           | https://pypi.org/project/pyalex/
        
         | kettlecorn wrote:
         | I miss the Google of yesteryear which had an altruistic streak
         | and felt that enriching the world's ability to share and
         | process information would ultimately accrue benefit to Google
         | as well.
         | 
         | The Google of today is far more boring and less helpful.
        
           | smgit wrote:
           | Its a hard job to maintain systems in an altruistic state,
           | cause opportunists and parasites are drawn in larger and
           | larger numbers to where ever resources accumulate.
           | 
           | Google has a decent job not turning fully into an Oracle for
           | example.
        
             | insane_dreamer wrote:
             | That's a really really low bar
        
         | consf wrote:
         | As history has shown with other Google projects, there's always
         | the potential for features to be deprioritized
        
         | BlindEyeHalo wrote:
         | computer science has dblp.org which indexes all the relevant
         | journals.
        
       | theanonymousone wrote:
       | The post uses the expression "delve into" :-/
        
         | sourcepluck wrote:
         | Is this a jokey reference to that time Paul Graham upset large
         | amounts of Nigerians on Twitter? Or, rather, genuine concern at
         | the thought that the article may have been generated by
         | chatbots?
        
           | trash_cat wrote:
           | It's because Taylor Swift's lates album uses a lot of
           | 'delve'.
        
         | Der_Einzige wrote:
         | LLMs linguistically colonized humans already so now humans use
         | LLM slop in their day-to-day verbal communications.
         | 
         | Unironically the plot of MGS5 the Phantom Pain literally
         | happened IRL. Skullface would be proud!
        
         | kome wrote:
         | lol, so what?
        
       | pkoird wrote:
       | Unpopular opinion but I really liked Microsoft Academic instead
       | until they canned it, sadly.
        
         | afandian wrote:
         | What do you make of OpenAlex, which inherited the dataset?
        
         | breuleux wrote:
         | I liked Microsoft Academic far better, if only because it
         | actually had an API.
        
       | photochemsyn wrote:
       | I've been using Google Scholar for a long time, but I'm finding
       | ChatGPT search with well-crafted prompts gets more focused and
       | relevant results than a complex keyword search on GS does.
       | However it's often still easier to find a link to the pdf version
       | of the paper using GS, but then scihub is still an option and can
       | work when all else fails.
        
       | chris_wot wrote:
       | How long till they kill it?
        
       | looneysquash wrote:
       | Oh good, it's just a celebration and not an announcement that
       | they're killing it.
        
       | rnewme wrote:
       | Time goes by fast. It's interesting to think how authors son is
       | now 20 as well.
       | 
       | Another interesting thing is little popup form at the end of post
       | asking me if my opinion of Google changed for the better after
       | reading the post. I mean maybe a bit, b the form definitely
       | knocked the score back down.
        
       | agnishom wrote:
       | Google Scholar is extremely valuable to the academic community. I
       | am afraid that Google will decide to scrap it someday, and we
       | will be left with a number of inferior alternatives.
        
         | idunnoman1222 wrote:
         | Like annas archive?
        
         | domoritz wrote:
         | Semantic scholar is pretty good so I keep using it more and
         | more.
        
         | jonas21 wrote:
         | Google employs thousands of researchers who would be less
         | productive (and upset) if they scrapped it. That alone is
         | probably enough to make it worthwhile to keep it going, at
         | least until a good alternative emerges.
        
           | elAhmo wrote:
           | Given that they have killed products with millions of users,
           | including a lot of paying users, relying on this is
           | optimistic. Google doesn't seem to care about major
           | inconvenience they cause, like with the Google Domains sale
           | Squarespace.
        
             | leemee wrote:
             | I think the point was that Google is _sometimes_ willing to
             | support projects if it helps their employees do their job,
             | which might be the case here.
        
         | jillesvangurp wrote:
         | Google employs a lot of academics that probably use it. And of
         | course they have a few AI related products that are probably
         | being trained on scientific content as well. I bet Google
         | Scholar feeds data into that effort. My guess is that keeping
         | google scholar up and running isn't breaking the bank for them
         | and it is actually a valuable resource for them.
        
         | kmmlng wrote:
         | Well, at least Google Scholar is aligned with Google's core
         | business: search. It seems silly for Google to scrap search
         | features. On the other hand, I'm not sure if Google Scholar is
         | aligned with their _real_ core business: ads.
        
         | asdff wrote:
         | IMO pubmed is superior for life sciences, especially if you use
         | their entrez direct. Really powerful query tooling.
        
       | 1propionyl wrote:
       | A reminder to everyone: if you want a "legal" copy of a paper you
       | can always just try emailing one of the first authors. They will
       | 99.99% send you back a PDF.
        
         | dredmorbius wrote:
         | Dead authors don't.
         | 
         | The friction is tremendously higher than on-demand downloadable
         | options: LibGen, SciHub, ZLibrary, Anna's Archive, or even
         | sources such as ArXiv, SocArXiv, SSRN, which are far more
         | fragmentary and limited.
        
       | ultimoo wrote:
       | "Now with AI outlines, you can quickly grasp the main points or
       | delve into specific details that pique your interest"
       | 
       | is this a nod to pg's delve blowup on twitter?
        
         | fforflo wrote:
         | Haha,that, or it's a validation of the blowup.
        
       | MollyRealized wrote:
       | The availability of case law has been a massive bonus.
        
       | consf wrote:
       | Google Scholar was an absolute lifesaver during my university
       | years! Reading this journey makes me appreciate even more how
       | much thought and effort went into creating such a valuable
       | resource. I remember the frustration of hitting paywalls or
       | struggling to track down references in the library.
        
       | codeflo wrote:
       | Pushing a half-abandoned but widely beloved project into the
       | visibility of the bean counters at Google with a birthday
       | announcement like that is a dangerous game. Best of luck.
        
         | uecker wrote:
         | Sadly, this is a very valid concern.
        
         | llm_trw wrote:
         | Google is a denger to the world, not because it's a monopoly
         | but because it makes wonderful tools that are better than
         | anything else available at the time. Everything else goes bust.
         | Then google shutters tool and we're left worse off than if they
         | did nothing.
        
       | 2dvisio wrote:
       | 20 years and still no API. In my past as an academic I've tried
       | several times to build systems to depend on Scholar and was
       | always taken aback by the lack of an API. I get it was not to be
       | swallowed whole by other publishers etc, but that has reduced the
       | potential of the product.
        
         | asdff wrote:
         | What field are you in? If you are in life sciences the pubmed
         | api (entrez direct) is pretty good.
        
         | mkatx wrote:
         | You mean public, documented API's? Everything is/has an API.
        
       | PeterStuer wrote:
       | I love it when I receive a scolar mail informing that there is a
       | new citation of a 20+ year old long forgotten paper.
        
       | foxbee wrote:
       | I found the post interestingly personable, something that I don't
       | often find with Google. I've used Google Scholar for many years,
       | before I used Elsevier and it was a gamechanger.
        
       | QuantumG wrote:
       | CiteSeer we barely knew you.
        
         | esafak wrote:
         | I'm surprised there are so few comments about it. It had more
         | features than Google Scholar.
        
       | cryptozeus wrote:
       | Slightly unrelated but I also enjoyed google's magazines section
       | 
       | https://books.google.com/books/magazines/language/en
        
       | guwop wrote:
       | for people upset with google scholars lack of an API, check out
       | openalex! awesome project. but crazy to think how much net
       | positive google scholar has provided for the world..
        
       ___________________________________________________________________
       (page generated 2024-11-19 23:02 UTC)