[HN Gopher] Fun facts about Google Scholar
       ___________________________________________________________________
        
       Fun facts about Google Scholar
        
       Author : thepuppet33r
       Score  : 124 points
       Date   : 2024-11-18 18:01 UTC (4 hours ago)
        
 (HTM) web link (blog.google)
 (TXT) w3m dump (blog.google)
        
       | thepuppet33r wrote:
       | Yes, Google deserves to be distrusted and avoided as a whole, but
       | Google Scholar is a genuinely net good for humanity.
        
         | dumpHero2 wrote:
         | I have similar feeing for Gmail (it's effective anti spam
         | engine), google maps and google docs (which pioneered shared
         | docs. It feels outdated on many fronts now, but it was a
         | pioneer).
        
           | roflmaostc wrote:
           | anti-spam is only an issue if people dump their email
           | anywhere. I usually register my mail on webpages as
           | first.last+webpage@mail.com and once they would spam this
           | mail, it gets blacklisted.
           | 
           | I literally get only 1-3 real spam mails per month without
           | any filter.
        
             | dripton wrote:
             | Words great, until a page rejects email with a '+' in it.
        
               | 6510 wrote:
               | dots are ignored, can filter by john.doe@gmail.com
               | 
               | not sure about capital letters
        
               | hks0 wrote:
               | Not everyone's cup of tea, but quite nice if one can
               | afford it: I have my personal domain and a catch-all
               | inbox. So if I want to register at acme-co.xyz I will
               | just use acmecoxyz@my-domain.tld
               | 
               | Maybe I should start using random words though? Wonder if
               | someone will go bananas seeing their brand's name on my
               | domain.
        
               | kroltan wrote:
               | Yeah, I've had to explain that a couple times already,
               | usually when dealing with customer support or in-person
               | registrations.
               | 
               | And a "malicious" actor can get away with pretending to
               | be another company by spoofing the username if they know
               | your domain works like that. I don't think this has
               | reached spammers' repertoire yet, but I wouldn't be
               | surprised.
               | 
               | Eventually I'd like to have a way of generating random
               | email addresses that accept mail on demand, and put
               | everything else in quaraintine automatically.
        
               | AshamedCaptain wrote:
               | Or just knows about this Gmail trick (it's been 20 years
               | already) and sends spam to your real mailbox.
               | 
               | Actually, I am surprised _any_ spammy website these days
               | would even honor the part after the +, and not just
               | directly send to the real mailbox name.
        
               | thechao wrote:
               | I used to require a "+..." on all emails. Any email that
               | didn't have the "+..." was sent to Spam automagically. My
               | family were whitelisted. I gave up, because too many
               | websites (early on) refused to take the "+..." marker, so
               | I ended up losing too much to Spam. It's easier to just
               | let Google sort it out.
        
               | gnopgnip wrote:
               | It's part of RFC 5233 Sieve Email Filtering: Subaddress
               | Extension
        
             | janalsncm wrote:
             | I see this recommendation everywhere and I am genuinely
             | surprised that it works. Any spammer can find out your real
             | address since there is an obvious mapping from + addresses
             | to your real address. An actual solution would hide this
             | mapping.
        
               | bachmeier wrote:
               | Yeah. Fastmail masked addresses are random. The best you
               | can do is guess that an address might be masked, due to
               | it not being johnsmith@fastmail.com, but it provides no
               | information about your real email address.
        
           | coderintherye wrote:
           | Good for users of Gmail, but is it a net good? Gmail spam
           | prevention is great for the Google Apps orgs I manage.
           | However, for the other inboxes the vast majority of spam they
           | receive comes from @gmail.com
        
             | thaumasiotes wrote:
             | > Gmail spam prevention is great for the Google Apps orgs I
             | manage.
             | 
             | Gmail is unlikely to let spam through.
             | 
             | But that doesn't make its spam filter great; it's also very
             | prone to blocking personal communication on the grounds
             | that it must actually have been spam. The principle of
             | gmail's spam filter is just "don't let anything through".
             | 
             | It would be much better to get more spam and also not have
             | my actual communications disappear.
        
           | whiplash451 wrote:
           | Try MS OneDrive before calling google docs outdated
           | 
           | Google spanks everyone else on robustness and responsiveness
        
             | rty32 wrote:
             | Yes until it fails
             | 
             | https://www.theverge.com/2023/11/27/23978591/google-drive-
             | de...
        
               | whiplash451 wrote:
               | That issue got resolved in a few days [1] -- and for each
               | and every one of these extremely rare events at Google,
               | you'll find similar ones at MS.
               | 
               | I am referring to robustness at scale and every day:
               | Google released auto-save years before MS. MS pales in
               | comparison in the UX.
               | 
               | Note: I have no vested interest in Google, not ex-
               | googler, etc.
               | 
               | [1]
               | https://support.google.com/drive/thread/245861992/drive-
               | for-...
        
           | globular-toast wrote:
           | Google maps would only be a net good if the data was
           | available under a free licence. As it is they take data from
           | people that should have gone to a public project like
           | OpenStreetMap.
        
             | arccy wrote:
             | "take", these people would never have produced any data if
             | gmaps wasn't there...
        
               | hatthew wrote:
               | At one point I contributed quite a bit to google maps,
               | because it was the primary map system I was using at the
               | time. Had I been using an OSM-based system, I would have
               | made contributions there instead.
        
               | arccy wrote:
               | indeed, osm can't paint itself like a victim, it needs
               | good end products to bring in contributors.
        
           | gray_-_wolf wrote:
           | Most of the spam I get is _from_ gmail. Maybe they should
           | apply their so effective spam engine to outgoing mail as
           | well...
        
             | crazygringo wrote:
             | It's probably not. You can put any domain you want on the
             | "from" address. Just because it says it was from Gmail
             | doesn't mean it actually was, unless it's signed with DKIM
             | etc.
             | 
             | I had a domain for a while that people got spam "from" all
             | the time. It had nothing to do with me and there was
             | nothing I could do about it.
        
               | dpifke wrote:
               | I run mail servers for myself, a couple of side projects,
               | and some friends and family. A double-digit percentage of
               | all spam caught by my filters is from Google's mail
               | servers, not just forged @gmail.com addresses.
               | 
               | Of the "too big to block outright" spam senders, behind
               | Twilio Sendgrid and Weebly, Google is currently #3.
               | Amazon is a close #4. None of the top four currently have
               | useful abuse reporting mechanisms... Sendgrid used to be
               | OK, but they no longer seem to take any action. Google
               | doesn't even accept abuse reports, which is ironic
               | because "does not accept or act upon abuse reports" is
               | criteria for being blocked by Google.
               | 
               | Most spam from Google is fake invoices and 419 scams.
               | This is trivially filtered on my end, which makes it
               | perplexing Google doesn't choose to do so. I can
               | guarantee that exactly 0% of Gmail users sending out
               | renewal invoices for "N0rton Anti-Virus" are legitimate.
        
               | gray_-_wolf wrote:
               | I would hope google has DKIM and SPF set.
        
       | renewiltord wrote:
       | Google Scholar is fantastic stuff. I am so grateful for it. It's
       | crazy how easy it is to find papers these days by just going to
       | it. University library search functions are completely useless in
       | comparison.
        
       | elashri wrote:
       | I did not know about PDF Scholar Readee extension [1].
       | Unfortunately the reason is that I use Firefox only (and safari
       | iOS) and it is not available there. The AI outlines will be
       | useful and I can think of myself using it.
       | 
       | I do not want to comment on number 20. I really wished that I
       | joined CERN 10 years earlier but then it is the mistake of my
       | parents :)
       | 
       | [1] https://chromewebstore.google.com/detail/google-scholar-
       | pdf-...
        
       | dctoedt wrote:
       | I'd not known about "F.D.C. Willard" -- the _nom de plume_ of a
       | Michigan State physics professor 's Siamese cat, Chester -- who
       | was listed as a co-author of a number of the professor's physics
       | papers.
       | 
       | More on Chester and his co-author status:
       | https://en.wikipedia.org/wiki/F._D._C._Willard
        
       | mananaysiempre wrote:
       | 21. Google Scholar will deny access to you if you (need to) self-
       | host a VPN on a common VPS provider. Being a Google product, it
       | also can't be special-cased in your routing table. (I genuinely
       | had to retrain myself to use Google Scholar again once I no
       | longer had that need.)
       | 
       | 22. Switching on sort by date will impose a filter to papers
       | published within the year, and you cannot do anything about that.
        
         | eesmith wrote:
         | > 22. Switching on sort by date will impose a filter to papers
         | published within the year, and you cannot do anything about
         | that.
         | 
         | !!! And here I thought it's been broken for years, and a sign
         | of decay due to lack of internal support.
        
           | buildbot wrote:
           | I swear this was working for me until literally today, it was
           | really useful to find older ML papers?!
        
             | mananaysiempre wrote:
             | There is _filter_ by date and _sort_ by date. The former
             | works. The latter, when enabled, even adds a banner on top
             | of the page (in large but gray type) that says "Articles
             | added in the last year, sorted by date", and resets any
             | filter you might have set before.
        
               | MichaelZuo wrote:
               | Was this change ever logged or noted some way? Or did it
               | just show up one day?
        
               | philipkglass wrote:
               | If it ever returned time-sorted results without limit,
               | that was long in the past. It has truncated results to
               | one year for the last several years I have used Scholar.
        
               | crazygringo wrote:
               | It seems so intentionally "broken", I can only guess it
               | is to prevent scraping? Since searching for generic-ish
               | search terms and sorting by date is a common scraping
               | strategy.
               | 
               | Still, you'd think they'd do a cutoff of e.g. 500 or
               | 1,000 items rather than filter by the past year.
               | 
               | So I can't help but wonder if it's a contractual
               | limitation insisted on by publishers? Since the
               | publishers also don't want all their papers being
               | spidered via Scholar? It feels kind of like a limitation
               | a lawyer came up with.
        
       | svat wrote:
       | Related: 2014 article by Steven Levy, titled "The Gentleman Who
       | Made Scholar": https://www.wired.com/2014/10/the-gentleman-who-
       | made-scholar...
        
         | Thrymr wrote:
         | > Would he want to continue working on Scholar for another ten
         | years? "One always believes there are other opportunities, but
         | the problem is how to pursue them when you are in a place you
         | like and you have been doing really well. I can do problems
         | that seem very interesting me -- but the biggest impact I can
         | possible make is helping people who are solving the world's
         | problems to be more efficient. If I can make the world's
         | researchers ten percent more efficient, consider the cumulative
         | impact of that. So if I ended up spending the next ten years
         | going this, I think I would be extremely happy."
         | 
         | Has he still been working on it in the 10 years since this
         | article? His name is in the byline of the new blog post, but
         | it's not clear from that how much he's been working on it.
        
           | the-rc wrote:
           | 12-13 years ago, I ran the system that inlined Scholar and
           | other results on the main search result pages. Anurag was
           | still involved, but AFAIR Alex, the other author of the post
           | who also had been there from the start, worked on most code
           | changes. I would guess that things are more or less the same
           | today. (Because it had such limited headcount, Scholar was
           | known to lag behind other services when it came to
           | code/infrastructure migrations.)
        
             | jll29 wrote:
             | Thanks for that inside scoop, even if it's a bit dated; I
             | wonder if they read this discussion, perhaps.
             | 
             | An important feature request would be a view where only
             | peer-reviewed publications (specifically, not ArXiv and
             | other pre-print archives) are included in the citation
             | counts, and self-citations are also excluded.
             | 
             | A way to download all citation sources would also be a
             | great nice-to-have.
        
       | zeroonetwothree wrote:
       | Google Scholar is so good. I started doing research right when it
       | came out and it was amazingly helpful. I can't imagine how it was
       | done before.
        
         | IshKebab wrote:
         | There are alternatives, like Web of Knowledge. You basically
         | need to be in a Uni for that though.
        
         | leephillips wrote:
         | I would go to the library and pull volumes of _Science Citation
         | Index_ off the shelves. Yes, Google Scholar was a revolution.
        
         | dekhn wrote:
         | I'd go to the card catalog (index), turn my question into a bag
         | of words (tokenize), fetch all the cards matching each token
         | (posting lists), drop cards which didn't include enough of the
         | tokens (posting list intersection), ordering the cards by the
         | number of tokens they matched (keyword match ranking), filter
         | at some cutoff, and then reorder based on the h-index of the
         | author (page rank). Then I would read each paper in order,
         | following citations in a breadth-first manner.
         | 
         | (the above is a joke comparing old school library work to
         | search engines circa 2000; I didn't actually do all those
         | steps. I'd usually just find the most recent review article and
         | read the papers it cited).
        
       | kylebenzle wrote:
       | I was hoping it would be 20 tips and tricks on how to use the
       | service better not random fun facts about its history :-(
        
       | chromatin wrote:
       | 21. No API
        
       | malshe wrote:
       | I use Google Scholar daily and it's been a fantastic resource.
       | Google Scholar with Zotero completes my articles search and
       | storage.
       | 
       | Btw, Anurag's last name is misspelt under the picture. It reads
       | "Achurya" instead of "Acharya"
       | 
       | Edit: They fixed it
        
       | lbeckman314 wrote:
       | > 18. A paw-sitive contribution to Physics. F.D.C Willard
       | (otherwise known as Chester, the Siamese cat) is listed as a co-
       | author on an article entitled: "Two, Three, and Four-Atom
       | Exchange Effects" that explores the magnetic properties of solid
       | helium-3 and how interactions between its atoms influence its
       | behavior at extremely low temperatures. Chester's starring role
       | came about because his co-author/owner, Jack H. Hetherington
       | wrote the entire paper with the plural "we" instead of a single
       | "I."
       | 
       | ---
       | 
       | 'Two-, Three-, and Four-Atom Exchange Effects in bcc 3He' by J.
       | H. Hetherington and F. D. C. Willard [0, 1, 2]
       | 
       | [0]
       | https://xkeys.com/media/wysiwyg/smartwave/porto/category/abo...
       | 
       | [1] https://xkeys.com/about/jackspages/fdcwillard.html
       | 
       | [2] https://en.wikipedia.org/wiki/F._D._C._Willard
        
       | russellbeattie wrote:
       | Huh. I tried the "Listen to article" button, because I knew it
       | was going to be generated and was curious to hear how it sounded.
       | 
       | Interestingly, it highlighted the words as it read. I haven't
       | seen that before online. Not sure how useful it is (especially
       | for anyone interested in this particular topic), but I thought it
       | was a neat innovation nevertheless.
        
       | gexaha wrote:
       | The most fun fact is that it still exists!
        
       | robwwilliams wrote:
       | Our department uses GScholar as a great research-focused CV
       | generator. Not used formally except that faculty pages have a
       | link to their GS pages.
        
       | wseqyrku wrote:
       | For a second I thought this was buzzfeed for some reason.
        
       | GeoAtreides wrote:
       | oh no
       | 
       | they remembered google scholar exists
       | 
       | it's a great product and I don't trust google at all not to break
       | it or mess with it
        
         | crazygringo wrote:
         | Google employs a lot of people from academia. Scholar is used
         | and loved by a _lot_ of people _within_ Google. It 's been
         | around for two decades. I really don't think it's going
         | anywhere.
        
           | dekhn wrote:
           | Reader was used and loved by a LOT of people WITHIN google,
           | but it was shut down (and the leadership that loved it even
           | made arguments in front of the company why it "had to be shut
           | down").
           | 
           | AFAICT Scholar remains because Anurag built up massive cred
           | in the early years (he was a critically important search
           | engineer) with Larry Page and kept his infra costs and
           | headcount really small, while also taking advantage of search
           | infra).
        
       | afandian wrote:
       | Some fun Google Scholar history from another perspective.
       | 
       | https://youtu.be/DZ2Bgwyx3nU?t=315
       | 
       | I recommend you watch the rest of the video, on the subject of
       | open/closed and enclosure of infrastructure.
        
       | teruakohatu wrote:
       | The best thing, by a long way, that Google Scholar has achieved
       | is denying Elsevier & co a monopoly on academic search.
       | 
       | In most universities here in New Zealand, articles have to be
       | published in a journal indexed by Elsevier's Scopus. Not in a
       | Scopus-indexed journal, it does not count anymore than a reddit
       | comment. This gives Elsevier tremendous power. But in CS/ML/AI
       | most academics and students turn to Google Scholar first when
       | doing searches.
        
         | freefaler wrote:
         | or turn to sci-hub and annas-arhive :)
        
           | philipkglass wrote:
           | You use Google Scholar to find papers you're interested in,
           | then use sci-hub to actually read them.
        
             | freefaler wrote:
             | indeed... and use Zotero with the correct plugin to
             | download them automagically
        
               | epcoa wrote:
               | sci-hub hasn't been updated in 4 years and the sources
               | for annas-archive like nexus-stc are seriously hit or
               | miss (depends on the field).
        
               | freefaler wrote:
               | Nothing lasts forever, but the model of buying a paper
               | for 40$ from Elsevier isn't much better. Depending on the
               | field there are other sources, but still a hit rate is
               | about 85-90%.
        
           | teruakohatu wrote:
           | Does sci-hub have up to date content these days?
           | 
           | Having pretty wide journal access through my institution
           | means I don't need to reach out to sci-hub.
        
             | epcoa wrote:
             | sci-hub proper hasn't been updated since it's indefinite
             | pause in december 2020. Alternatives are of variable
             | success depending on field. It might be better for CS/Math,
             | but medicine and life sciences it's pretty bad.
        
               | whimsicalism wrote:
               | i believe they paused due to an indian court injunction
               | and the case was heard this year, does anyone know any
               | update?
        
           | whimsicalism wrote:
           | scihub is dying unfortunately :( the good news is it is
           | happening just as all the fields i'm interested in except for
           | some experimental physics & biology have moved to OA
        
       | jrochkind1 wrote:
       | > 1. The team started with just two of us.
       | 
       | My guess for a while has been that it was back to two of them! if
       | that!
        
       | p4bl0 wrote:
       | I wish GScholar wouldn't embrace bibliometrics so much. Sort
       | papers by date (most recent papers first) by default on an
       | author's page rather than by citation count, or at least give
       | author the choice to individually opt-in to sort by date by
       | default.
        
       | random3 wrote:
       | Fun fact about Google Scholar: it's "free", but it's just another
       | soulless Google product - no clear strategy, no support, and a
       | fragile proprietary dependency in what should be an open
       | ecosystem. This creates inherent risks for the academic
       | community. We need the equivalent of arXiv for Google Scholar
        
         | afandian wrote:
         | The Invest in Open site has a good directory of open tools.
         | 
         | https://infrafinder.investinopen.org/solutions
        
       | theanonymousone wrote:
       | The post uses the expression "delve into" :-/
        
         | sourcepluck wrote:
         | Is this a jokey reference to that time Paul Graham upset large
         | amounts of Nigerians on Twitter? Or, rather, genuine concern at
         | the thought that the article may have been generated by
         | chatbots?
        
       ___________________________________________________________________
       (page generated 2024-11-18 23:00 UTC)