[HN Gopher] Tell HN: A case of negative SEO I caught on my servi...
       ___________________________________________________________________
        
       Tell HN: A case of negative SEO I caught on my service and how I
       dealt with it
        
       Recently, my service https://next-episode.net experienced a huge
       drop in Google rankings. As I've been running it for more than 15
       years, this is far from the first time this has happened. Usually
       I've been able to attribute big fluctuations (positive or negative)
       either to something I did, a Google algo change, or some external
       factor.  For example, about 2 years ago, something similar
       happened. While digging through my Search Console I discovered that
       Russian websites generated thousands of links pointing to a page on
       Next Episode with pornographic keywords used as link anchors. This
       was so effective that they managed to get those keywords to the top
       of the "Top linking text" in Google Search Console - naturally
       (most likely) resulting in drop in rankings for the regular
       keywords and the domain in general.  About a week ago, while trying
       to investigate the current drop in rankings and browsing through my
       "Latest links" external links export from Google Search Console, I
       noticed something funny. There were thousands of links in there
       (from 3 domains) following the same structure as on Next Episode:
       domain/show-name domain/show-name/browse domain/show-name/season-1,
       etc.  Following these links revealed something even funnier: all of
       them displayed content directly from my site! Not even
       scraped/cached content - they were dynamically pulling content from
       my server and displaying it on their domain. Even the search
       worked, the news archive and the top charts. Here is a list of
       those domains as an image: https://i.imgur.com/PjNKh0b.png. I've
       since blocked their access, so opening any of them will not show my
       website right now, but here is how it looked:
       https://i.imgur.com/HBiL3yh.png  Now, my first thought was that
       those were maybe scraping the content as part of a link farm (to
       spam with ads?), but I also wanted to know more. I experimented
       with Google searches that included pages from my website, like "Hot
       Shows - Next Episode" and ones with very specific news posts
       subjects like "Streaming Services Availability added to Episodes
       and Movies" (posted in September last year). Imagine my surprise
       when I discovered that not only the domains above were indexed by
       Google (and were listed in the Search results), but there were 4-5
       more domains that did the same thing and some of them even
       outranked mine!  Here is a full list of domains that I discovered
       by searching for my news posts subjects:
       https://i.imgur.com/dAm1CzI.png. If you Google for site:domain.com
       you'll see some of them have thousands of pages indexed by Google.
       Trying out more keyword searches, I was also able to discover these
       domains: https://i.imgur.com/s5YjJWK.png (as they've cached the
       content, they still work). Those all seem to be part of the same
       operation, but they serve a different purpose - they have only
       scraped the home page of Next Episode and all their links point to
       inside pages on the other domains. I suspect this is to generate
       incoming links to the other domains and give them some credibility.
       As with the links with adult keywords text anchors mentioned above
       - I suspect this whole thing is a negative SEO campaign - I don't
       see any other reason for it to be happening and it seems to be
       achieving its goal. Once I found all I could find about the domains
       involved in this, I took some action:  1) disavowed all those
       domains through the Google disavow tool  2) investigated if I could
       redirect their pages to mine (as they were dynamically pulling the
       content - I could change it to whatever I wanted). I managed to
       make it work through JavaScript (though interestingly, it had to be
       obfuscated as they were doing some sanitizing when pulling my
       content and replacing strings like "window.location.href" with
       "window.loc1ion.href"), but in the end I decided against it and:
       3) I blocked their IPs through CloudFlare (all Russian IPs). An
       interesting thing here is that once I blocked an IP, the domain
       would somehow automatically switch to another IP to pull my content
       from, but once I blocked like 10 or 15 of them - they seem to have
       run out of IPs and now they stay blocked.  I looked for a way to
       report those domains to Google, but as of today, I've not found the
       place to do it. Does anybody know? Today, about a week after I
       blocked the domains that pulled content from my site, they still
       have thousands of my pages indexed in Google and are ranking better
       in some search results than me. I'm guessing with time, Google will
       catch up with the fact they don't show any content anymore and will
       delist those pages.  This whole thing was very new to me so I hope
       it'll raise awareness that this is going on and maybe help someone
       else catch it happening to their website. I'd appreciate any
       feedback on this and I'm around if you have any questions. It would
       also be interesting to hear about anyone's related experiences.
       Cheers!
        
       Author : santah
       Score  : 70 points
       Date   : 2021-02-11 19:21 UTC (3 hours ago)
        
       | throwaway13337 wrote:
       | Wow. That's absolutely horrible.
       | 
       | Looking at Google's search results, it's obvious that these
       | tactics are rampant and really winning the war here.
       | 
       | We need a new search engine that cannot be gamed so easily. I
       | know it's non trivial but the stakes are high as is the reward
       | for making such.
       | 
       | This is a real engineering challenge. I'm excited about the
       | problem space and opportunity.
        
         | DylanDmitri wrote:
         | I think the real lesson is that under enough pressure every
         | large system leaks. Anything that gatekeeps millions of real
         | dollars (search engines, stock markets, Amazon reviews,
         | insurance claims, etc) will constantly be exploited and patched
         | by nature of the thing. Only "solution" is to decrease
         | pressure, by say fragmenting market into 20+ search engines, so
         | that SEO people can't realistically optimize for all of them at
         | once.
         | 
         | Some smaller scope things can be made completely watertight,
         | for example mathematically proven cryptography, but even that
         | often leaks to government pressure.
        
         | judge2020 wrote:
         | Even though Google's not bulletproof, I don't think any search
         | engine that indexes literally every page could be created to
         | block all abuse.
        
         | post_below wrote:
         | "a new search engine that cannot be gamed"
         | 
         | So you mean a search engine that's 100% human curated? Or
         | rather, a directory, it wouldn't really be a search engine.
         | 
         | Any algorithmic signal can be gamed. Although I'd be curious to
         | hear how I'm wrong about that.
        
           | ISL wrote:
           | Humans are also readily gamed. See, for example, Troy.
        
           | TameAntelope wrote:
           | Am I crazy to think a "human curated X" is less impossible
           | than we think it is, even at scale?
           | 
           | Imagine if you could upvote/downvote Google search results,
           | and got rewarded for being "right" or something...
        
       | Matsta wrote:
       | I'm sorry to say, but the neg SEO didn't drop your rankings, it
       | was to do with the Google algorithm update [1]. Check the
       | screenshot from Ahrefs [2], and your traffic drops straight
       | after. [1] https://moz.com/blog/googles-december-2020-core-update
       | [2] https://i.imgur.com/DBkdUEk.png
       | 
       | Google's algorithm is smart enough to recognise Neg SEO attacks.
       | Sure five years ago you could buy a blast of spammy links using
       | Xrumer or GSA with some viagra anchor text and boom you're
       | competition is gone.
       | 
       | From a quick glance, most of your pages have pretty thin content,
       | and I assume it's pulling from an API, so none of it is unique.
       | If there was one thing I would do is try to build some content on
       | pages. A great tool to analyse and develop content that is SEO
       | friendly is SurferSEO - highly recommend it.
       | 
       | I'm surprised your forum doesn't rank as well as your main site
       | as it looks fairly active. However, I'm not sure about how PunBB
       | does SEO wise.
        
       | clscott wrote:
       | It's an interesting story, I won'der if you could turn the
       | automation trick around on them.
       | 
       | Would you be able to make them do the same negative SEOing but to
       | their own site?
       | 
       | Fill their site with unrelated garbage and internal links with
       | undesirable anchor text.
       | 
       | * unbock their IP * create content that links back to their site
       | with the undesirable keywords * only show this content to them
       | and not regular visitors * don't let them grab much / any
       | legitimate content
        
       | stickfigure wrote:
       | Earlier today in another thread we were joking about GaaS (Goatse
       | as a Service) but now maybe I think that's not so crazy an idea
       | after all.
       | 
       | https://news.ycombinator.com/item?id=26104087
       | 
       | Your YC application practically writes itself.
        
       ___________________________________________________________________
       (page generated 2021-02-11 23:01 UTC)