[HN Gopher] 20 years of Google Scholar
___________________________________________________________________
20 years of Google Scholar
Author : thepuppet33r
Score : 391 points
Date : 2024-11-18 18:01 UTC (1 days ago)
(HTM) web link (blog.google)
(TXT) w3m dump (blog.google)
| thepuppet33r wrote:
| Yes, Google deserves to be distrusted and avoided as a whole, but
| Google Scholar is a genuinely net good for humanity.
| dumpHero2 wrote:
| I have similar feeing for Gmail (it's effective anti spam
| engine), google maps and google docs (which pioneered shared
| docs. It feels outdated on many fronts now, but it was a
| pioneer).
| roflmaostc wrote:
| anti-spam is only an issue if people dump their email
| anywhere. I usually register my mail on webpages as
| first.last+webpage@mail.com and once they would spam this
| mail, it gets blacklisted.
|
| I literally get only 1-3 real spam mails per month without
| any filter.
| dripton wrote:
| Words great, until a page rejects email with a '+' in it.
| 6510 wrote:
| dots are ignored, can filter by john.doe@gmail.com
|
| not sure about capital letters
| hks0 wrote:
| Not everyone's cup of tea, but quite nice if one can
| afford it: I have my personal domain and a catch-all
| inbox. So if I want to register at acme-co.xyz I will
| just use acmecoxyz@my-domain.tld
|
| Maybe I should start using random words though? Wonder if
| someone will go bananas seeing their brand's name on my
| domain.
| kroltan wrote:
| Yeah, I've had to explain that a couple times already,
| usually when dealing with customer support or in-person
| registrations.
|
| And a "malicious" actor can get away with pretending to
| be another company by spoofing the username if they know
| your domain works like that. I don't think this has
| reached spammers' repertoire yet, but I wouldn't be
| surprised.
|
| Eventually I'd like to have a way of generating random
| email addresses that accept mail on demand, and put
| everything else in quaraintine automatically.
| AshamedCaptain wrote:
| Or just knows about this Gmail trick (it's been 20 years
| already) and sends spam to your real mailbox.
|
| Actually, I am surprised _any_ spammy website these days
| would even honor the part after the +, and not just
| directly send to the real mailbox name.
| thechao wrote:
| I used to require a "+..." on all emails. Any email that
| didn't have the "+..." was sent to Spam automagically. My
| family were whitelisted. I gave up, because too many
| websites (early on) refused to take the "+..." marker, so
| I ended up losing too much to Spam. It's easier to just
| let Google sort it out.
| gnopgnip wrote:
| It's part of RFC 5233 Sieve Email Filtering: Subaddress
| Extension
| aorth wrote:
| Good resource on this trick from 2010. It's not Gmail
| specific.
|
| https://people.cs.rutgers.edu/~watrous/plus-signs-in-
| email-a...
| janalsncm wrote:
| I see this recommendation everywhere and I am genuinely
| surprised that it works. Any spammer can find out your real
| address since there is an obvious mapping from + addresses
| to your real address. An actual solution would hide this
| mapping.
| bachmeier wrote:
| Yeah. Fastmail masked addresses are random. The best you
| can do is guess that an address might be masked, due to
| it not being johnsmith@fastmail.com, but it provides no
| information about your real email address.
| JW_00000 wrote:
| Too late for most people.
| coderintherye wrote:
| Good for users of Gmail, but is it a net good? Gmail spam
| prevention is great for the Google Apps orgs I manage.
| However, for the other inboxes the vast majority of spam they
| receive comes from @gmail.com
| thaumasiotes wrote:
| > Gmail spam prevention is great for the Google Apps orgs I
| manage.
|
| Gmail is unlikely to let spam through.
|
| But that doesn't make its spam filter great; it's also very
| prone to blocking personal communication on the grounds
| that it must actually have been spam. The principle of
| gmail's spam filter is just "don't let anything through".
|
| It would be much better to get more spam and also not have
| my actual communications disappear.
| whiplash451 wrote:
| Try MS OneDrive before calling google docs outdated
|
| Google spanks everyone else on robustness and responsiveness
| rty32 wrote:
| Yes until it fails
|
| https://www.theverge.com/2023/11/27/23978591/google-drive-
| de...
| whiplash451 wrote:
| That issue got resolved in a few days [1] -- and for each
| and every one of these extremely rare events at Google,
| you'll find similar ones at MS.
|
| I am referring to robustness at scale and every day:
| Google released auto-save years before MS. MS pales in
| comparison in the UX.
|
| Note: I have no vested interest in Google, not ex-
| googler, etc.
|
| [1]
| https://support.google.com/drive/thread/245861992/drive-
| for-...
| Fogest wrote:
| As much as I try to "de-google" myself and try to avoid
| being trapped in the Google eco system, I'd definitely
| choose it over MS Office. I am stuck in the MS Office eco
| system at work. Some of their products are starting to
| improve in MS Office, but you can still tell it's a lot of
| hacks ontop of old systems. Especially when it comes to the
| whole teams/onedrive/sharepoint side of things.
|
| One of my biggest gripes right now is that we heavily rely
| on Microsoft Teams. A lot of our work laptops still are
| stuck on 8gb of ram. I find Microsoft Teams can easily suck
| back a full gig or more or ram, especially when in a video
| call. From my understanding, Teams is running essentially
| like an Electron app (except using an Edge browser
| packaged).
|
| I have no problem with web based apps, but man, some
| optimization is called for.
| nextos wrote:
| I use a decade-old NUC with plenty of RAM as a daily
| driver. It doesn't struggle with anything except MS
| Teams. It can churn through Zoom or Meet calls while
| compiling code. Teams is a bloated mess that makes the
| fans spin at max RPM.
|
| It's crazy I can boot a kernel, with an entire graphics
| and network stack, X and a terminal in less than 200 MB
| but then the Teams webapp uses a massive amount of
| resources and grinds everything else to a halt.
|
| Word 365 also becomes incredibly laggy on long documents
| with tons of comments, whereas Google Docs is just fine.
| But, apparently, this is also a thing on modern hardware.
| I guess these days Microsoft has little attention to
| detail.
| Fogest wrote:
| It's funny because sometimes Teams uses more resources
| than the Edge browser. Despite Teams being Edge based for
| their application.
|
| I think overall many companies have gotten lazy/sloppy
| when it comes to optimization. Game dev is even worse for
| this. I like how Microsoft products integrate with each
| other, but often the whole thing feels sloppy and
| unoptimized.
| globular-toast wrote:
| Google maps would only be a net good if the data was
| available under a free licence. As it is they take data from
| people that should have gone to a public project like
| OpenStreetMap.
| arccy wrote:
| "take", these people would never have produced any data if
| gmaps wasn't there...
| hatthew wrote:
| At one point I contributed quite a bit to google maps,
| because it was the primary map system I was using at the
| time. Had I been using an OSM-based system, I would have
| made contributions there instead.
| arccy wrote:
| indeed, osm can't paint itself like a victim, it needs
| good end products to bring in contributors.
| wbl wrote:
| I ran into trouble because Open Topo does not report a
| stream the 7.5" series does. There's serious data quality
| issues that can make it not work for some applications.
| gray_-_wolf wrote:
| Most of the spam I get is _from_ gmail. Maybe they should
| apply their so effective spam engine to outgoing mail as
| well...
| crazygringo wrote:
| It's probably not. You can put any domain you want on the
| "from" address. Just because it says it was from Gmail
| doesn't mean it actually was, unless it's signed with DKIM
| etc.
|
| I had a domain for a while that people got spam "from" all
| the time. It had nothing to do with me and there was
| nothing I could do about it.
| dpifke wrote:
| I run mail servers for myself, a couple of side projects,
| and some friends and family. A double-digit percentage of
| all spam caught by my filters is from Google's mail
| servers, not just forged @gmail.com addresses.
|
| Of the "too big to block outright" spam senders, behind
| Twilio Sendgrid and Weebly, Google is currently #3.
| Amazon is a close #4. None of the top four currently have
| useful abuse reporting mechanisms... Sendgrid used to be
| OK, but they no longer seem to take any action. Google
| doesn't even accept abuse reports, which is ironic
| because "does not accept or act upon abuse reports" is
| criteria for being blocked by Google.
|
| Most spam from Google is fake invoices and 419 scams.
| This is trivially filtered on my end, which makes it
| perplexing Google doesn't choose to do so. I can
| guarantee that exactly 0% of Gmail users sending out
| renewal invoices for "N0rton Anti-Virus" are legitimate.
| gray_-_wolf wrote:
| I would hope google has DKIM and SPF set.
| csomar wrote:
| The Spam I get from "gmail" and ends up in my spam folder
| is spoofed. The Spam I get from gmail and ends up in my
| inbox _is_ from gmail. Spammers will mass-create accounts
| and mass-sell them to spammers.
| AlienRobot wrote:
| "Google is evil, except for all the Google products Google
| produced"
|
| Honestly, if we compare Google to Amazon, Microsoft, Apple,
| and Meta, isn't Google the least evil one?
| insane_dreamer wrote:
| No, I'd put them in this descending order of evilness:
| Meta, Amazon, Microsoft, Google, Apple.
| asdff wrote:
| Do people use shared docs often in the workplace? I only used
| it on like two group projects in school and it probably made
| things more clunky than if we just wrote our portions and
| compiled them after. Maybe it works for some workflows but
| having multiple people editing the same document is chaotic,
| unless you delegate who does what, at which point there's no
| point in having it be a shared doc when the responsibilities
| are delegated.
| mwest217 wrote:
| All the time, it is incredibly useful to send a doc for
| comments, which can be attached to the relevant piece of
| text. I use shared editing less often, but I find it's
| especially useful in incident response where there may be
| multiple investigation workstreams, and the incident
| commander needs to be able to see all of them.
| guappa wrote:
| Nah google maps shows/hides things with very obscure logic.
|
| Like you can ask to find a restaurant and it won't point you
| to the closer one but to one that is few km away instead.
| insane_dreamer wrote:
| Google Maps is a net positive as well
| robertlagrant wrote:
| Some more:
|
| - Google Search
|
| - YouTube (more debateable, but I think it's a marvel)
|
| - Google Books
|
| - ChromeBooks
|
| - Android
|
| - Google Calendar
|
| - Google Earth
|
| - Google Drive
|
| - Google Docs
|
| - Waze
|
| - Android Auto
|
| - Google Pay
|
| - Kubernetes
|
| - Go
|
| - VP8 / VP9
|
| I'd rather take all those products than leave them.
| insane_dreamer wrote:
| ok, but Search aside (which is Google's primary product; we
| were talking about side project), many of these are also-
| rans; they didn't really change the landscape the way
| Google Maps (and of course Search). OK, maybe Android, but
| that wasn't developed by Google. Neither was YouTube
| (groundbreaking), or Waze (not groundbreaking).
|
| The only one I would take from your list would be
| Kubernetes and Google Earth, and Kubernetes being more of a
| dev tool would really count as far as impact and usefulness
| to society (Go would fit there).
|
| Google Books _could_ have been great, but Google didn't
| take care of it. Same with Google Reader.
| hfsh wrote:
| >which is Google's primary product
|
| Which _used_ to be Google 's primary product, way waaaay
| back when. Their primary product now is advertising, and
| has been for a very long time.
| codeflo wrote:
| I'll reserve judgement on its net effect until the moment they
| kill it.
| renewiltord wrote:
| Google Scholar is fantastic stuff. I am so grateful for it. It's
| crazy how easy it is to find papers these days by just going to
| it. University library search functions are completely useless in
| comparison.
| elashri wrote:
| I did not know about PDF Scholar Readee extension [1].
| Unfortunately the reason is that I use Firefox only (and safari
| iOS) and it is not available there. The AI outlines will be
| useful and I can think of myself using it.
|
| I do not want to comment on number 20. I really wished that I
| joined CERN 10 years earlier but then it is the mistake of my
| parents :)
|
| [1] https://chromewebstore.google.com/detail/google-scholar-
| pdf-...
| dctoedt wrote:
| I'd not known about "F.D.C. Willard" -- the _nom de plume_ of a
| Michigan State physics professor 's Siamese cat, Chester -- who
| was listed as a co-author of a number of the professor's physics
| papers.
|
| More on Chester and his co-author status:
| https://en.wikipedia.org/wiki/F._D._C._Willard
| mananaysiempre wrote:
| 21. Google Scholar will deny access to you if you (need to) self-
| host a VPN on a common VPS provider. Being a Google product, it
| also can't be special-cased in your routing table. (I genuinely
| had to retrain myself to use Google Scholar again once I no
| longer had that need.)
|
| 22. Switching on sort by date will impose a filter to papers
| published within the year, and you cannot do anything about that.
| eesmith wrote:
| > 22. Switching on sort by date will impose a filter to papers
| published within the year, and you cannot do anything about
| that.
|
| !!! And here I thought it's been broken for years, and a sign
| of decay due to lack of internal support.
| buildbot wrote:
| I swear this was working for me until literally today, it was
| really useful to find older ML papers?!
| mananaysiempre wrote:
| There is _filter_ by date and _sort_ by date. The former
| works. The latter, when enabled, even adds a banner on top
| of the page (in large but gray type) that says "Articles
| added in the last year, sorted by date", and resets any
| filter you might have set before.
| MichaelZuo wrote:
| Was this change ever logged or noted some way? Or did it
| just show up one day?
| philipkglass wrote:
| If it ever returned time-sorted results without limit,
| that was long in the past. It has truncated results to
| one year for the last several years I have used Scholar.
| crazygringo wrote:
| It seems so intentionally "broken", I can only guess it
| is to prevent scraping? Since searching for generic-ish
| search terms and sorting by date is a common scraping
| strategy.
|
| Still, you'd think they'd do a cutoff of e.g. 500 or
| 1,000 items rather than filter by the past year.
|
| So I can't help but wonder if it's a contractual
| limitation insisted on by publishers? Since the
| publishers also don't want all their papers being
| spidered via Scholar? It feels kind of like a limitation
| a lawyer came up with.
| eesmith wrote:
| Unlikely, since the easy work-around for scrapers is to
| search by date range and grab things that way. That's
| what I do now manually.
| asdff wrote:
| pubmed is literally built for academic scraping. It even
| has a command line interface to access it. If publishers
| were worried about scraping they'd target that, but they
| don't. In fact when papers go on pubmed after a year they
| are rehosted by pubmed central and made freely available
| to anyone in the world.
| svat wrote:
| Related: 2014 article by Steven Levy, titled "The Gentleman Who
| Made Scholar": https://www.wired.com/2014/10/the-gentleman-who-
| made-scholar...
| Thrymr wrote:
| > Would he want to continue working on Scholar for another ten
| years? "One always believes there are other opportunities, but
| the problem is how to pursue them when you are in a place you
| like and you have been doing really well. I can do problems
| that seem very interesting me -- but the biggest impact I can
| possible make is helping people who are solving the world's
| problems to be more efficient. If I can make the world's
| researchers ten percent more efficient, consider the cumulative
| impact of that. So if I ended up spending the next ten years
| going this, I think I would be extremely happy."
|
| Has he still been working on it in the 10 years since this
| article? His name is in the byline of the new blog post, but
| it's not clear from that how much he's been working on it.
| the-rc wrote:
| 12-13 years ago, I ran the system that inlined Scholar and
| other results on the main search result pages. Anurag was
| still involved, but AFAIR Alex, the other author of the post
| who also had been there from the start, worked on most code
| changes. I would guess that things are more or less the same
| today. (Because it had such limited headcount, Scholar was
| known to lag behind other services when it came to
| code/infrastructure migrations.)
| jll29 wrote:
| Thanks for that inside scoop, even if it's a bit dated; I
| wonder if they read this discussion, perhaps.
|
| An important feature request would be a view where only
| peer-reviewed publications (specifically, not ArXiv and
| other pre-print archives) are included in the citation
| counts, and self-citations are also excluded.
|
| A way to download all citation sources would also be a
| great nice-to-have.
| zeroonetwothree wrote:
| Google Scholar is so good. I started doing research right when it
| came out and it was amazingly helpful. I can't imagine how it was
| done before.
| IshKebab wrote:
| There are alternatives, like Web of Knowledge. You basically
| need to be in a Uni for that though.
| leephillips wrote:
| I would go to the library and pull volumes of _Science Citation
| Index_ off the shelves. Yes, Google Scholar was a revolution.
| dekhn wrote:
| I'd go to the card catalog (index), turn my question into a bag
| of words (tokenize), fetch all the cards matching each token
| (posting lists), drop cards which didn't include enough of the
| tokens (posting list intersection), ordering the cards by the
| number of tokens they matched (keyword match ranking), filter
| at some cutoff, and then reorder based on the h-index of the
| author (page rank). Then I would read each paper in order,
| following citations in a breadth-first manner.
|
| (the above is a joke comparing old school library work to
| search engines circa 2000; I didn't actually do all those
| steps. I'd usually just find the most recent review article and
| read the papers it cited).
| asdff wrote:
| I had an old boss who did it in the old analog way. He had a
| secretary handle his email and transcribing stuff he hand
| wrote. He had print subscriptions to a couple nature journals,
| science, and a couple research niche specific journals and he
| read them basically cover to cover. He'd attend conferences and
| had many collaborators who would send him papers from their own
| lab to opine on.
|
| I actually respect this style a lot. There is a firehose of
| papers coming onto google scholar each day. You type in some
| keyword you get 500 hits. This cut that down substantially for
| him in a way where he never missed anything big (reading nature
| and science), kept up with what the field has been doing
| (reading the more niche specific journals and keeping up with
| the labs who put out this niche work), and seeing what was
| coming up in the pipeline from the conferences or what sort of
| research new grants were requesting. I'm not sure that scholar
| would have helped much.
| kylebenzle wrote:
| I was hoping it would be 20 tips and tricks on how to use the
| service better not random fun facts about its history :-(
| chromatin wrote:
| 21. No API
| malshe wrote:
| I use Google Scholar daily and it's been a fantastic resource.
| Google Scholar with Zotero completes my articles search and
| storage.
|
| Btw, Anurag's last name is misspelt under the picture. It reads
| "Achurya" instead of "Acharya"
|
| Edit: They fixed it
| lbeckman314 wrote:
| > 18. A paw-sitive contribution to Physics. F.D.C Willard
| (otherwise known as Chester, the Siamese cat) is listed as a co-
| author on an article entitled: "Two, Three, and Four-Atom
| Exchange Effects" that explores the magnetic properties of solid
| helium-3 and how interactions between its atoms influence its
| behavior at extremely low temperatures. Chester's starring role
| came about because his co-author/owner, Jack H. Hetherington
| wrote the entire paper with the plural "we" instead of a single
| "I."
|
| ---
|
| 'Two-, Three-, and Four-Atom Exchange Effects in bcc 3He' by J.
| H. Hetherington and F. D. C. Willard [0, 1, 2]
|
| [0]
| https://xkeys.com/media/wysiwyg/smartwave/porto/category/abo...
|
| [1] https://xkeys.com/about/jackspages/fdcwillard.html
|
| [2] https://en.wikipedia.org/wiki/F._D._C._Willard
| lr1970 wrote:
| Sir Andre Geim [0], the only person in the world who received
| both the real Nobel prize in Physics and the Ig Nobel prize co-
| authored one of his articles [1] with his hamster Tisha.
|
| [0] https://en.wikipedia.org/wiki/Andre_Geim
|
| [1]
| https://repository.ubn.ru.nl//bitstream/handle/2066/249681/2...
| russellbeattie wrote:
| Huh. I tried the "Listen to article" button, because I knew it
| was going to be generated and was curious to hear how it sounded.
|
| Interestingly, it highlighted the words as it read. I haven't
| seen that before online. Not sure how useful it is (especially
| for anyone interested in this particular topic), but I thought it
| was a neat innovation nevertheless.
| gexaha wrote:
| The most fun fact is that it still exists!
| robwwilliams wrote:
| Our department uses GScholar as a great research-focused CV
| generator. Not used formally except that faculty pages have a
| link to their GS pages.
| wseqyrku wrote:
| For a second I thought this was buzzfeed for some reason.
| GeoAtreides wrote:
| oh no
|
| they remembered google scholar exists
|
| it's a great product and I don't trust google at all not to break
| it or mess with it
| crazygringo wrote:
| Google employs a lot of people from academia. Scholar is used
| and loved by a _lot_ of people _within_ Google. It 's been
| around for two decades. I really don't think it's going
| anywhere.
| dekhn wrote:
| Reader was used and loved by a LOT of people WITHIN google,
| but it was shut down (and the leadership that loved it even
| made arguments in front of the company why it "had to be shut
| down").
|
| AFAICT Scholar remains because Anurag built up massive cred
| in the early years (he was a critically important search
| engineer) with Larry Page and kept his infra costs and
| headcount really small, while also taking advantage of search
| infra).
| crazygringo wrote:
| If it matters, they cited declining usage of Reader as a
| reason for shutting it down.
|
| It seems like Scholar has an overall upward trend, although
| their methodology notes make it hard to compare some
| periods directly:
|
| https://trends.google.com/trends/explore?date=all&q=%2Fm%2F
| 0...
|
| I'm basically assuming this is the rate of growth of
| graduate school, and no competing products have had any
| real effect?
| dekhn wrote:
| Reader usage was declining because the application was
| not being developed. The other reason they mentioned is
| that it would have required a lot of work to rewrite the
| app to be consistent with the new user data policies
| being put into place.
| afandian wrote:
| Some fun Google Scholar history from another perspective.
|
| https://youtu.be/DZ2Bgwyx3nU?t=315
|
| I recommend you watch the rest of the video, on the subject of
| open/closed and enclosure of infrastructure.
| teruakohatu wrote:
| The best thing, by a long way, that Google Scholar has achieved
| is denying Elsevier & co a monopoly on academic search.
|
| In most universities here in New Zealand, articles have to be
| published in a journal indexed by Elsevier's Scopus. Not in a
| Scopus-indexed journal, it does not count anymore than a reddit
| comment. This gives Elsevier tremendous power. But in CS/ML/AI
| most academics and students turn to Google Scholar first when
| doing searches.
| freefaler wrote:
| or turn to sci-hub and annas-arhive :)
| philipkglass wrote:
| You use Google Scholar to find papers you're interested in,
| then use sci-hub to actually read them.
| freefaler wrote:
| indeed... and use Zotero with the correct plugin to
| download them automagically
| epcoa wrote:
| sci-hub hasn't been updated in 4 years and the sources
| for annas-archive like nexus-stc are seriously hit or
| miss (depends on the field).
| freefaler wrote:
| Nothing lasts forever, but the model of buying a paper
| for 40$ from Elsevier isn't much better. Depending on the
| field there are other sources, but still a hit rate is
| about 85-90%.
| mateus1 wrote:
| Any alternatives?
| consf wrote:
| I remember that when I no longer had access to university
| subscriptions, Sci-Hub was my salvation during those times
| when I had no money of my own
| orochimaaru wrote:
| Not sure if researchgate is still a thing. I had it and
| uploaded all my papers there. They show up automatically on
| Google. I believe this is allowed since you're allowed to
| share copies of your publication on your website.
|
| The problem is my researchgate account was connected to my
| academic account. It's been a while since I graduated so
| I've lost access to my own publications and page.
|
| But I used to use researchgate and requests in researchgate
| quite a bit.
| teruakohatu wrote:
| Does sci-hub have up to date content these days?
|
| Having pretty wide journal access through my institution
| means I don't need to reach out to sci-hub.
| epcoa wrote:
| sci-hub proper hasn't been updated since it's indefinite
| pause in december 2020. Alternatives are of variable
| success depending on field. It might be better for CS/Math,
| but medicine and life sciences it's pretty bad.
| whimsicalism wrote:
| i believe they paused due to an indian court injunction
| and the case was heard this year, does anyone know any
| update?
| insane_dreamer wrote:
| How would an Indian court case have any jurisdiction in
| Russia (not to mention mirrors)?
| cipheredStones wrote:
| Sci-Hub complied with the order with the intent to
| actually argue their case (and possibly establish a legal
| justification for the site), rather than just defying the
| order and continuing to play cat-and-mouse with every
| authority.
| joshuaissac wrote:
| And this is because they have a chance of winning. The
| same court has previously adopted a broad interpretation
| of what constitutes fair dealing.
|
| https://en.wikipedia.org/wiki/University_of_Oxford_v._Ram
| esh...
| whimsicalism wrote:
| scihub is dying unfortunately :( the good news is it is
| happening just as all the fields i'm interested in except for
| some experimental physics & biology have moved to OA
| Loughla wrote:
| oa resources have really kicked it into high gear post
| covid. They used to be kind of a joke, but they're actually
| competitive now. It's nice to see.
| Onawa wrote:
| I believe NIH's directive that all intramural and
| extramural research must be published OA has helped move
| things in that direction quite a lot.
| kedarkhand wrote:
| Sorry but what is OA?
| bloak wrote:
| https://en.wikipedia.org/wiki/Open_access (I assume)
| thrdbndndn wrote:
| I'm a proud user of sci-hub but when I was still in
| academics, I have never used it. My school has access to all
| the journals I ever needed, plus more old non-digitized ones
| I can borrow from library (including interlibrary access).
| thrw42A8N wrote:
| My school has no such thing and yet requires me to find and
| cite research.
| consf wrote:
| I think access to research shouldn't be a luxury or
| dependent on where you study
| slashtab wrote:
| Reminds me of Aaron Swartz.
| Melatonic wrote:
| The legend
| consf wrote:
| For those outside such institutions, tools like Sci-Hub
| often become a lifeline (as it was for me)
| ryzvonusef wrote:
| It depends on the discipline, also the mode of learning
| (I'm distance learning so no physical library access).
|
| My uni (Northampton) has access to a LOT of journals... but
| has a blindspot in management, specifically accountancy
| focus journals; am doing my lit review for my MSc
| dissertation and the number of times I hit a dead end is
| frustrating.
|
| Sci-hub and Annas-Archive are also not interested in that
| segment, so double whammy.
|
| But surprisingly Archive.org was able to help me out a bit,
| so thanks for that.
| Suppafly wrote:
| >My school has access to all the journals I ever needed
|
| I miss being on a university network and having paywalled
| journals and such just magically load.
| p4bl0 wrote:
| Yet it still participates and encourages the bibliometrics
| game, which benefits the big publishers.
|
| A simple way to make a step away from encouraging bibliometrics
| (which would be a step in the right direction) would be to list
| publications by date (most recent first) on authors pages
| rather than by citations count, or at least to let either users
| and/or authors choose the default sorting they want to use
| (when visiting a page for users, for their page by default for
| authors).
| Scriddie wrote:
| this^10
| SideQuark wrote:
| > the bibliometrics game
|
| Bibliometrics, in use for over 150 years now, is not a game.
| That's like arguing there is no value in the PageRank
| algorithm, and no validity to trying to find out which
| journals or researchers or research teams publish better
| content using evidence to do so.
|
| > which benefits the big publishers
|
| Ignoring that it helps small researchers seems short sighted.
|
| > A simple way to make a step ... would be to list
| publications by date
|
| It's really that hard to click "year" and have that sorted?
|
| It's almost a certainty when someone is looking for a
| scholar, they are looking for more highly cited work than
| not, so the default is probably the best use of reader times.
| I absolutely know when I look up an author, I am interested
| in what other work they did that is highly regarded more than
| any other factor. Once in a while I look to see what they did
| recently, which is exactly one click away.
| mindcrime wrote:
| To be fair, you did hedge and say "almost a certainty" and
| maybe that's true. But speaking for myself, I generally
| couldn't care less about citation count. If anything, my
| interest in a document may be inversely proportional to the
| citation count. And that's because I'm often looking for
| either a. "lost gems" - things are are actually
| great/useful research, but that got overlooked for whatever
| reason, or b. historical references to obscure topics that
| I'm deep-diving into.
|
| BUT... I'm not in formal academia, I care very little about
| publishing research myself (at least not from a
| bibliometric perspective. For me "publishing" might be
| writing a blog post or maybe submitting a pre-print
| somewhere) so I'm just not part of that whole
| (racket|game|whatever-you-want-to-call-it).
| jrochkind1 wrote:
| > 1. The team started with just two of us.
|
| My guess for a while has been that it was back to two of them! if
| that!
| p4bl0 wrote:
| I wish GScholar wouldn't embrace bibliometrics so much. Sort
| papers by date (most recent papers first) by default on an
| author's page rather than by citation count, or at least give
| author the choice to individually opt-in to sort by date by
| default.
| random3 wrote:
| Fun fact about Google Scholar: it's "free", but it's just another
| soulless Google product - no clear strategy, no support, and a
| fragile proprietary dependency in what should be an open
| ecosystem. This creates inherent risks for the academic
| community. We need the equivalent of arXiv for Google Scholar
| afandian wrote:
| The Invest in Open site has a good directory of open tools.
|
| https://infrafinder.investinopen.org/solutions
| kergonath wrote:
| Yes. On one hand I'd like Google to improve things a bit. There
| are some rough edges, which is a shame because it indexes some
| things that are not in Scopus or Web of Knowledge, like theses
| and preprint repositories. On the other hand I worry that some
| manager somewhere would kill it if they realised that it is
| still around.
| random3 wrote:
| Every 1-2 months when Chrome updates I get banned by their
| throttling mechanism because I their extension makes too many
| requests and they see "unusual traffic"
|
| It can take 1-2 weeks to go away and be able to use it.
| There's no way to get in contact with anyone. Tried the
| Chrome extension email, support forums.
|
| It's a good reality check. There's no real support behind it
| and it can go away just like Google Reader did.
|
| I think the motivations behind it are laudable, but they
| should not be the answer to the actual problem.
| griomnib wrote:
| I'm fairly sure they only exist because Larry/Sergei might
| give half a fuck if they killed it outright, and it has a
| small enough team that the cost savings for killing aren't
| enough for Ruth to want to make that argument.
| sitkack wrote:
| And that is semantic scholar, https://www.semanticscholar.org/
| mapmeld wrote:
| For people unfamiliar, Semantic Scholar is run by the Allen
| Institute and has been researching accurate AI summarization
| and semantic search for years. Also they have support for
| author name changes.
| crazygringo wrote:
| How does it compare with Google Scholar?
|
| It advertises itself as "from all fields of science" --
| does that includes fields like economics? Sociology?
| Political science? What about law journals? In other words,
| is the coverage as broad? And if it doesn't include certain
| fields, where is the "science" line drawn?
|
| And I'm curious if people find it to be as useful (or more)
| just in terms of UX, features, etc.
| Onawa wrote:
| Semantic Scholar's search is pretty good, but there are
| also a variety of other (paid) projects that expand on
| its API. Look at tools like Scite and LitMaps for what's
| possible with the semantic scholar dataset.
|
| As for coverage, I think it focuses more on the life
| sciences, but I'm not positive about that.
| ninjin wrote:
| They are substantially smaller in coverage, but have
| higher quality in my experience. Remarkably, they are
| also willing to correct their data if you notify them.
| This of course in is stark contrast to Google Scholar
| where the metadata of papers is frequently _wildly_
| inaccurate. On top of this, Semantic Scholar shares their
| underlying data (although you need to request an API
| key). Overall, they have been growing slowly and steadily
| over the years and I have a lot of respect for what their
| team is doing for researchers such as myself.
|
| Now for the less great.
|
| They are pushing the concept of "Highly Influential
| Citations" [1] as their _default_ metric, which to the
| best of my knowledge is based on a _singular_ workshop
| publication that produced a classifier trained on about
| 500 training samples to classify citations. I am a very
| harsh critic of any metrics for scientific impact. But
| this is just utter madness. Guaranteeing that this metric
| is not grossly misleading is nearly impossible and it
| feels like the only reason they picked it is because
| Etzioni (AI2 head) is the last author of the workshop
| paper. It should have been _at best_ a novelty metric and
| certainly not the default one.
|
| [1]: https://webflow.semanticscholar.org/faq/influential-
| citation...
|
| Recently, they introduced their Semantic Reader
| functionality and are now pushing it as a default way to
| access PDFs on the website. Forcing you to click on a
| drop down to access plain PDFs. It may or may not be a
| great tool, but it feels somewhat obvious that they are
| attempting to use shady patterns to push you in the
| direction they want.
|
| Lastly, they have started using Google Analytics. Which
| is not great, but I can understand why they go for the
| industry default.
|
| Overall, I use them nearly daily and they are the best
| offering out there for my area of research. Although, I
| at times feel tempted to grab the data and create an
| alternative (simpler) frontend with fewer distractions
| and "modern" web nonsense.
| crazygringo wrote:
| Thank you so much!
| bugglebeetle wrote:
| OpenAlex is a really good here too, including their API.
| They're also the inheritors of the Microsoft Academic Graph,
| fully open source and open data:
|
| https://openalex.org
| valusson wrote:
| It's nice, but OpenAlex is better.
| https://explore.openalex.org/ It also has a free API and
| people have built python libraries to access it.
| https://pypi.org/project/pyalex/
| kettlecorn wrote:
| I miss the Google of yesteryear which had an altruistic streak
| and felt that enriching the world's ability to share and
| process information would ultimately accrue benefit to Google
| as well.
|
| The Google of today is far more boring and less helpful.
| smgit wrote:
| Its a hard job to maintain systems in an altruistic state,
| cause opportunists and parasites are drawn in larger and
| larger numbers to where ever resources accumulate.
|
| Google has a decent job not turning fully into an Oracle for
| example.
| insane_dreamer wrote:
| That's a really really low bar
| consf wrote:
| As history has shown with other Google projects, there's always
| the potential for features to be deprioritized
| BlindEyeHalo wrote:
| computer science has dblp.org which indexes all the relevant
| journals.
| theanonymousone wrote:
| The post uses the expression "delve into" :-/
| sourcepluck wrote:
| Is this a jokey reference to that time Paul Graham upset large
| amounts of Nigerians on Twitter? Or, rather, genuine concern at
| the thought that the article may have been generated by
| chatbots?
| trash_cat wrote:
| It's because Taylor Swift's lates album uses a lot of
| 'delve'.
| Der_Einzige wrote:
| LLMs linguistically colonized humans already so now humans use
| LLM slop in their day-to-day verbal communications.
|
| Unironically the plot of MGS5 the Phantom Pain literally
| happened IRL. Skullface would be proud!
| kome wrote:
| lol, so what?
| pkoird wrote:
| Unpopular opinion but I really liked Microsoft Academic instead
| until they canned it, sadly.
| afandian wrote:
| What do you make of OpenAlex, which inherited the dataset?
| breuleux wrote:
| I liked Microsoft Academic far better, if only because it
| actually had an API.
| photochemsyn wrote:
| I've been using Google Scholar for a long time, but I'm finding
| ChatGPT search with well-crafted prompts gets more focused and
| relevant results than a complex keyword search on GS does.
| However it's often still easier to find a link to the pdf version
| of the paper using GS, but then scihub is still an option and can
| work when all else fails.
| chris_wot wrote:
| How long till they kill it?
| looneysquash wrote:
| Oh good, it's just a celebration and not an announcement that
| they're killing it.
| rnewme wrote:
| Time goes by fast. It's interesting to think how authors son is
| now 20 as well.
|
| Another interesting thing is little popup form at the end of post
| asking me if my opinion of Google changed for the better after
| reading the post. I mean maybe a bit, b the form definitely
| knocked the score back down.
| agnishom wrote:
| Google Scholar is extremely valuable to the academic community. I
| am afraid that Google will decide to scrap it someday, and we
| will be left with a number of inferior alternatives.
| idunnoman1222 wrote:
| Like annas archive?
| domoritz wrote:
| Semantic scholar is pretty good so I keep using it more and
| more.
| jonas21 wrote:
| Google employs thousands of researchers who would be less
| productive (and upset) if they scrapped it. That alone is
| probably enough to make it worthwhile to keep it going, at
| least until a good alternative emerges.
| elAhmo wrote:
| Given that they have killed products with millions of users,
| including a lot of paying users, relying on this is
| optimistic. Google doesn't seem to care about major
| inconvenience they cause, like with the Google Domains sale
| Squarespace.
| leemee wrote:
| I think the point was that Google is _sometimes_ willing to
| support projects if it helps their employees do their job,
| which might be the case here.
| jillesvangurp wrote:
| Google employs a lot of academics that probably use it. And of
| course they have a few AI related products that are probably
| being trained on scientific content as well. I bet Google
| Scholar feeds data into that effort. My guess is that keeping
| google scholar up and running isn't breaking the bank for them
| and it is actually a valuable resource for them.
| kmmlng wrote:
| Well, at least Google Scholar is aligned with Google's core
| business: search. It seems silly for Google to scrap search
| features. On the other hand, I'm not sure if Google Scholar is
| aligned with their _real_ core business: ads.
| asdff wrote:
| IMO pubmed is superior for life sciences, especially if you use
| their entrez direct. Really powerful query tooling.
| 1propionyl wrote:
| A reminder to everyone: if you want a "legal" copy of a paper you
| can always just try emailing one of the first authors. They will
| 99.99% send you back a PDF.
| dredmorbius wrote:
| Dead authors don't.
|
| The friction is tremendously higher than on-demand downloadable
| options: LibGen, SciHub, ZLibrary, Anna's Archive, or even
| sources such as ArXiv, SocArXiv, SSRN, which are far more
| fragmentary and limited.
| ultimoo wrote:
| "Now with AI outlines, you can quickly grasp the main points or
| delve into specific details that pique your interest"
|
| is this a nod to pg's delve blowup on twitter?
| fforflo wrote:
| Haha,that, or it's a validation of the blowup.
| MollyRealized wrote:
| The availability of case law has been a massive bonus.
| consf wrote:
| Google Scholar was an absolute lifesaver during my university
| years! Reading this journey makes me appreciate even more how
| much thought and effort went into creating such a valuable
| resource. I remember the frustration of hitting paywalls or
| struggling to track down references in the library.
| codeflo wrote:
| Pushing a half-abandoned but widely beloved project into the
| visibility of the bean counters at Google with a birthday
| announcement like that is a dangerous game. Best of luck.
| uecker wrote:
| Sadly, this is a very valid concern.
| llm_trw wrote:
| Google is a denger to the world, not because it's a monopoly
| but because it makes wonderful tools that are better than
| anything else available at the time. Everything else goes bust.
| Then google shutters tool and we're left worse off than if they
| did nothing.
| 2dvisio wrote:
| 20 years and still no API. In my past as an academic I've tried
| several times to build systems to depend on Scholar and was
| always taken aback by the lack of an API. I get it was not to be
| swallowed whole by other publishers etc, but that has reduced the
| potential of the product.
| asdff wrote:
| What field are you in? If you are in life sciences the pubmed
| api (entrez direct) is pretty good.
| mkatx wrote:
| You mean public, documented API's? Everything is/has an API.
| PeterStuer wrote:
| I love it when I receive a scolar mail informing that there is a
| new citation of a 20+ year old long forgotten paper.
| foxbee wrote:
| I found the post interestingly personable, something that I don't
| often find with Google. I've used Google Scholar for many years,
| before I used Elsevier and it was a gamechanger.
| QuantumG wrote:
| CiteSeer we barely knew you.
| esafak wrote:
| I'm surprised there are so few comments about it. It had more
| features than Google Scholar.
| cryptozeus wrote:
| Slightly unrelated but I also enjoyed google's magazines section
|
| https://books.google.com/books/magazines/language/en
| guwop wrote:
| for people upset with google scholars lack of an API, check out
| openalex! awesome project. but crazy to think how much net
| positive google scholar has provided for the world..
___________________________________________________________________
(page generated 2024-11-19 23:02 UTC)