[HN Gopher] Fun facts about Google Scholar
___________________________________________________________________
Fun facts about Google Scholar
Author : thepuppet33r
Score : 124 points
Date : 2024-11-18 18:01 UTC (4 hours ago)
(HTM) web link (blog.google)
(TXT) w3m dump (blog.google)
| thepuppet33r wrote:
| Yes, Google deserves to be distrusted and avoided as a whole, but
| Google Scholar is a genuinely net good for humanity.
| dumpHero2 wrote:
| I have similar feeing for Gmail (it's effective anti spam
| engine), google maps and google docs (which pioneered shared
| docs. It feels outdated on many fronts now, but it was a
| pioneer).
| roflmaostc wrote:
| anti-spam is only an issue if people dump their email
| anywhere. I usually register my mail on webpages as
| first.last+webpage@mail.com and once they would spam this
| mail, it gets blacklisted.
|
| I literally get only 1-3 real spam mails per month without
| any filter.
| dripton wrote:
| Words great, until a page rejects email with a '+' in it.
| 6510 wrote:
| dots are ignored, can filter by john.doe@gmail.com
|
| not sure about capital letters
| hks0 wrote:
| Not everyone's cup of tea, but quite nice if one can
| afford it: I have my personal domain and a catch-all
| inbox. So if I want to register at acme-co.xyz I will
| just use acmecoxyz@my-domain.tld
|
| Maybe I should start using random words though? Wonder if
| someone will go bananas seeing their brand's name on my
| domain.
| kroltan wrote:
| Yeah, I've had to explain that a couple times already,
| usually when dealing with customer support or in-person
| registrations.
|
| And a "malicious" actor can get away with pretending to
| be another company by spoofing the username if they know
| your domain works like that. I don't think this has
| reached spammers' repertoire yet, but I wouldn't be
| surprised.
|
| Eventually I'd like to have a way of generating random
| email addresses that accept mail on demand, and put
| everything else in quaraintine automatically.
| AshamedCaptain wrote:
| Or just knows about this Gmail trick (it's been 20 years
| already) and sends spam to your real mailbox.
|
| Actually, I am surprised _any_ spammy website these days
| would even honor the part after the +, and not just
| directly send to the real mailbox name.
| thechao wrote:
| I used to require a "+..." on all emails. Any email that
| didn't have the "+..." was sent to Spam automagically. My
| family were whitelisted. I gave up, because too many
| websites (early on) refused to take the "+..." marker, so
| I ended up losing too much to Spam. It's easier to just
| let Google sort it out.
| gnopgnip wrote:
| It's part of RFC 5233 Sieve Email Filtering: Subaddress
| Extension
| janalsncm wrote:
| I see this recommendation everywhere and I am genuinely
| surprised that it works. Any spammer can find out your real
| address since there is an obvious mapping from + addresses
| to your real address. An actual solution would hide this
| mapping.
| bachmeier wrote:
| Yeah. Fastmail masked addresses are random. The best you
| can do is guess that an address might be masked, due to
| it not being johnsmith@fastmail.com, but it provides no
| information about your real email address.
| coderintherye wrote:
| Good for users of Gmail, but is it a net good? Gmail spam
| prevention is great for the Google Apps orgs I manage.
| However, for the other inboxes the vast majority of spam they
| receive comes from @gmail.com
| thaumasiotes wrote:
| > Gmail spam prevention is great for the Google Apps orgs I
| manage.
|
| Gmail is unlikely to let spam through.
|
| But that doesn't make its spam filter great; it's also very
| prone to blocking personal communication on the grounds
| that it must actually have been spam. The principle of
| gmail's spam filter is just "don't let anything through".
|
| It would be much better to get more spam and also not have
| my actual communications disappear.
| whiplash451 wrote:
| Try MS OneDrive before calling google docs outdated
|
| Google spanks everyone else on robustness and responsiveness
| rty32 wrote:
| Yes until it fails
|
| https://www.theverge.com/2023/11/27/23978591/google-drive-
| de...
| whiplash451 wrote:
| That issue got resolved in a few days [1] -- and for each
| and every one of these extremely rare events at Google,
| you'll find similar ones at MS.
|
| I am referring to robustness at scale and every day:
| Google released auto-save years before MS. MS pales in
| comparison in the UX.
|
| Note: I have no vested interest in Google, not ex-
| googler, etc.
|
| [1]
| https://support.google.com/drive/thread/245861992/drive-
| for-...
| globular-toast wrote:
| Google maps would only be a net good if the data was
| available under a free licence. As it is they take data from
| people that should have gone to a public project like
| OpenStreetMap.
| arccy wrote:
| "take", these people would never have produced any data if
| gmaps wasn't there...
| hatthew wrote:
| At one point I contributed quite a bit to google maps,
| because it was the primary map system I was using at the
| time. Had I been using an OSM-based system, I would have
| made contributions there instead.
| arccy wrote:
| indeed, osm can't paint itself like a victim, it needs
| good end products to bring in contributors.
| gray_-_wolf wrote:
| Most of the spam I get is _from_ gmail. Maybe they should
| apply their so effective spam engine to outgoing mail as
| well...
| crazygringo wrote:
| It's probably not. You can put any domain you want on the
| "from" address. Just because it says it was from Gmail
| doesn't mean it actually was, unless it's signed with DKIM
| etc.
|
| I had a domain for a while that people got spam "from" all
| the time. It had nothing to do with me and there was
| nothing I could do about it.
| dpifke wrote:
| I run mail servers for myself, a couple of side projects,
| and some friends and family. A double-digit percentage of
| all spam caught by my filters is from Google's mail
| servers, not just forged @gmail.com addresses.
|
| Of the "too big to block outright" spam senders, behind
| Twilio Sendgrid and Weebly, Google is currently #3.
| Amazon is a close #4. None of the top four currently have
| useful abuse reporting mechanisms... Sendgrid used to be
| OK, but they no longer seem to take any action. Google
| doesn't even accept abuse reports, which is ironic
| because "does not accept or act upon abuse reports" is
| criteria for being blocked by Google.
|
| Most spam from Google is fake invoices and 419 scams.
| This is trivially filtered on my end, which makes it
| perplexing Google doesn't choose to do so. I can
| guarantee that exactly 0% of Gmail users sending out
| renewal invoices for "N0rton Anti-Virus" are legitimate.
| gray_-_wolf wrote:
| I would hope google has DKIM and SPF set.
| renewiltord wrote:
| Google Scholar is fantastic stuff. I am so grateful for it. It's
| crazy how easy it is to find papers these days by just going to
| it. University library search functions are completely useless in
| comparison.
| elashri wrote:
| I did not know about PDF Scholar Readee extension [1].
| Unfortunately the reason is that I use Firefox only (and safari
| iOS) and it is not available there. The AI outlines will be
| useful and I can think of myself using it.
|
| I do not want to comment on number 20. I really wished that I
| joined CERN 10 years earlier but then it is the mistake of my
| parents :)
|
| [1] https://chromewebstore.google.com/detail/google-scholar-
| pdf-...
| dctoedt wrote:
| I'd not known about "F.D.C. Willard" -- the _nom de plume_ of a
| Michigan State physics professor 's Siamese cat, Chester -- who
| was listed as a co-author of a number of the professor's physics
| papers.
|
| More on Chester and his co-author status:
| https://en.wikipedia.org/wiki/F._D._C._Willard
| mananaysiempre wrote:
| 21. Google Scholar will deny access to you if you (need to) self-
| host a VPN on a common VPS provider. Being a Google product, it
| also can't be special-cased in your routing table. (I genuinely
| had to retrain myself to use Google Scholar again once I no
| longer had that need.)
|
| 22. Switching on sort by date will impose a filter to papers
| published within the year, and you cannot do anything about that.
| eesmith wrote:
| > 22. Switching on sort by date will impose a filter to papers
| published within the year, and you cannot do anything about
| that.
|
| !!! And here I thought it's been broken for years, and a sign
| of decay due to lack of internal support.
| buildbot wrote:
| I swear this was working for me until literally today, it was
| really useful to find older ML papers?!
| mananaysiempre wrote:
| There is _filter_ by date and _sort_ by date. The former
| works. The latter, when enabled, even adds a banner on top
| of the page (in large but gray type) that says "Articles
| added in the last year, sorted by date", and resets any
| filter you might have set before.
| MichaelZuo wrote:
| Was this change ever logged or noted some way? Or did it
| just show up one day?
| philipkglass wrote:
| If it ever returned time-sorted results without limit,
| that was long in the past. It has truncated results to
| one year for the last several years I have used Scholar.
| crazygringo wrote:
| It seems so intentionally "broken", I can only guess it
| is to prevent scraping? Since searching for generic-ish
| search terms and sorting by date is a common scraping
| strategy.
|
| Still, you'd think they'd do a cutoff of e.g. 500 or
| 1,000 items rather than filter by the past year.
|
| So I can't help but wonder if it's a contractual
| limitation insisted on by publishers? Since the
| publishers also don't want all their papers being
| spidered via Scholar? It feels kind of like a limitation
| a lawyer came up with.
| svat wrote:
| Related: 2014 article by Steven Levy, titled "The Gentleman Who
| Made Scholar": https://www.wired.com/2014/10/the-gentleman-who-
| made-scholar...
| Thrymr wrote:
| > Would he want to continue working on Scholar for another ten
| years? "One always believes there are other opportunities, but
| the problem is how to pursue them when you are in a place you
| like and you have been doing really well. I can do problems
| that seem very interesting me -- but the biggest impact I can
| possible make is helping people who are solving the world's
| problems to be more efficient. If I can make the world's
| researchers ten percent more efficient, consider the cumulative
| impact of that. So if I ended up spending the next ten years
| going this, I think I would be extremely happy."
|
| Has he still been working on it in the 10 years since this
| article? His name is in the byline of the new blog post, but
| it's not clear from that how much he's been working on it.
| the-rc wrote:
| 12-13 years ago, I ran the system that inlined Scholar and
| other results on the main search result pages. Anurag was
| still involved, but AFAIR Alex, the other author of the post
| who also had been there from the start, worked on most code
| changes. I would guess that things are more or less the same
| today. (Because it had such limited headcount, Scholar was
| known to lag behind other services when it came to
| code/infrastructure migrations.)
| jll29 wrote:
| Thanks for that inside scoop, even if it's a bit dated; I
| wonder if they read this discussion, perhaps.
|
| An important feature request would be a view where only
| peer-reviewed publications (specifically, not ArXiv and
| other pre-print archives) are included in the citation
| counts, and self-citations are also excluded.
|
| A way to download all citation sources would also be a
| great nice-to-have.
| zeroonetwothree wrote:
| Google Scholar is so good. I started doing research right when it
| came out and it was amazingly helpful. I can't imagine how it was
| done before.
| IshKebab wrote:
| There are alternatives, like Web of Knowledge. You basically
| need to be in a Uni for that though.
| leephillips wrote:
| I would go to the library and pull volumes of _Science Citation
| Index_ off the shelves. Yes, Google Scholar was a revolution.
| dekhn wrote:
| I'd go to the card catalog (index), turn my question into a bag
| of words (tokenize), fetch all the cards matching each token
| (posting lists), drop cards which didn't include enough of the
| tokens (posting list intersection), ordering the cards by the
| number of tokens they matched (keyword match ranking), filter
| at some cutoff, and then reorder based on the h-index of the
| author (page rank). Then I would read each paper in order,
| following citations in a breadth-first manner.
|
| (the above is a joke comparing old school library work to
| search engines circa 2000; I didn't actually do all those
| steps. I'd usually just find the most recent review article and
| read the papers it cited).
| kylebenzle wrote:
| I was hoping it would be 20 tips and tricks on how to use the
| service better not random fun facts about its history :-(
| chromatin wrote:
| 21. No API
| malshe wrote:
| I use Google Scholar daily and it's been a fantastic resource.
| Google Scholar with Zotero completes my articles search and
| storage.
|
| Btw, Anurag's last name is misspelt under the picture. It reads
| "Achurya" instead of "Acharya"
|
| Edit: They fixed it
| lbeckman314 wrote:
| > 18. A paw-sitive contribution to Physics. F.D.C Willard
| (otherwise known as Chester, the Siamese cat) is listed as a co-
| author on an article entitled: "Two, Three, and Four-Atom
| Exchange Effects" that explores the magnetic properties of solid
| helium-3 and how interactions between its atoms influence its
| behavior at extremely low temperatures. Chester's starring role
| came about because his co-author/owner, Jack H. Hetherington
| wrote the entire paper with the plural "we" instead of a single
| "I."
|
| ---
|
| 'Two-, Three-, and Four-Atom Exchange Effects in bcc 3He' by J.
| H. Hetherington and F. D. C. Willard [0, 1, 2]
|
| [0]
| https://xkeys.com/media/wysiwyg/smartwave/porto/category/abo...
|
| [1] https://xkeys.com/about/jackspages/fdcwillard.html
|
| [2] https://en.wikipedia.org/wiki/F._D._C._Willard
| russellbeattie wrote:
| Huh. I tried the "Listen to article" button, because I knew it
| was going to be generated and was curious to hear how it sounded.
|
| Interestingly, it highlighted the words as it read. I haven't
| seen that before online. Not sure how useful it is (especially
| for anyone interested in this particular topic), but I thought it
| was a neat innovation nevertheless.
| gexaha wrote:
| The most fun fact is that it still exists!
| robwwilliams wrote:
| Our department uses GScholar as a great research-focused CV
| generator. Not used formally except that faculty pages have a
| link to their GS pages.
| wseqyrku wrote:
| For a second I thought this was buzzfeed for some reason.
| GeoAtreides wrote:
| oh no
|
| they remembered google scholar exists
|
| it's a great product and I don't trust google at all not to break
| it or mess with it
| crazygringo wrote:
| Google employs a lot of people from academia. Scholar is used
| and loved by a _lot_ of people _within_ Google. It 's been
| around for two decades. I really don't think it's going
| anywhere.
| dekhn wrote:
| Reader was used and loved by a LOT of people WITHIN google,
| but it was shut down (and the leadership that loved it even
| made arguments in front of the company why it "had to be shut
| down").
|
| AFAICT Scholar remains because Anurag built up massive cred
| in the early years (he was a critically important search
| engineer) with Larry Page and kept his infra costs and
| headcount really small, while also taking advantage of search
| infra).
| afandian wrote:
| Some fun Google Scholar history from another perspective.
|
| https://youtu.be/DZ2Bgwyx3nU?t=315
|
| I recommend you watch the rest of the video, on the subject of
| open/closed and enclosure of infrastructure.
| teruakohatu wrote:
| The best thing, by a long way, that Google Scholar has achieved
| is denying Elsevier & co a monopoly on academic search.
|
| In most universities here in New Zealand, articles have to be
| published in a journal indexed by Elsevier's Scopus. Not in a
| Scopus-indexed journal, it does not count anymore than a reddit
| comment. This gives Elsevier tremendous power. But in CS/ML/AI
| most academics and students turn to Google Scholar first when
| doing searches.
| freefaler wrote:
| or turn to sci-hub and annas-arhive :)
| philipkglass wrote:
| You use Google Scholar to find papers you're interested in,
| then use sci-hub to actually read them.
| freefaler wrote:
| indeed... and use Zotero with the correct plugin to
| download them automagically
| epcoa wrote:
| sci-hub hasn't been updated in 4 years and the sources
| for annas-archive like nexus-stc are seriously hit or
| miss (depends on the field).
| freefaler wrote:
| Nothing lasts forever, but the model of buying a paper
| for 40$ from Elsevier isn't much better. Depending on the
| field there are other sources, but still a hit rate is
| about 85-90%.
| teruakohatu wrote:
| Does sci-hub have up to date content these days?
|
| Having pretty wide journal access through my institution
| means I don't need to reach out to sci-hub.
| epcoa wrote:
| sci-hub proper hasn't been updated since it's indefinite
| pause in december 2020. Alternatives are of variable
| success depending on field. It might be better for CS/Math,
| but medicine and life sciences it's pretty bad.
| whimsicalism wrote:
| i believe they paused due to an indian court injunction
| and the case was heard this year, does anyone know any
| update?
| whimsicalism wrote:
| scihub is dying unfortunately :( the good news is it is
| happening just as all the fields i'm interested in except for
| some experimental physics & biology have moved to OA
| jrochkind1 wrote:
| > 1. The team started with just two of us.
|
| My guess for a while has been that it was back to two of them! if
| that!
| p4bl0 wrote:
| I wish GScholar wouldn't embrace bibliometrics so much. Sort
| papers by date (most recent papers first) by default on an
| author's page rather than by citation count, or at least give
| author the choice to individually opt-in to sort by date by
| default.
| random3 wrote:
| Fun fact about Google Scholar: it's "free", but it's just another
| soulless Google product - no clear strategy, no support, and a
| fragile proprietary dependency in what should be an open
| ecosystem. This creates inherent risks for the academic
| community. We need the equivalent of arXiv for Google Scholar
| afandian wrote:
| The Invest in Open site has a good directory of open tools.
|
| https://infrafinder.investinopen.org/solutions
| theanonymousone wrote:
| The post uses the expression "delve into" :-/
| sourcepluck wrote:
| Is this a jokey reference to that time Paul Graham upset large
| amounts of Nigerians on Twitter? Or, rather, genuine concern at
| the thought that the article may have been generated by
| chatbots?
___________________________________________________________________
(page generated 2024-11-18 23:00 UTC)