[HN Gopher] Google is the only search engine that works on Reddi...
___________________________________________________________________
Google is the only search engine that works on Reddit now, thanks
to AI deal
Author : turkeytotal
Score : 352 points
Date : 2024-07-24 13:41 UTC (9 hours ago)
(HTM) web link (www.404media.co)
(TXT) w3m dump (www.404media.co)
| VoidWhisperer wrote:
| Wow, reddit found a way to make themselves even less useful
| somehow. After the API fiasco, that seemed like it'd be pretty
| hard to do.
| wvenable wrote:
| But, apparently, they did finally find a way to make money.
| LunaSea wrote:
| Barely enough to pay the CEO
| jasode wrote:
| _> But, apparently, they did finally find a way to make
| money._
|
| The most recent 10-K financial results 2024-03-31 (filed
| 2024-05-08) shows they actually _lost_ money:
| https://www.sec.gov/edgar/browse/?CIK=1713445
|
| (For 2024-Q1, Reddit _lost -$575 million_ on revenue of $242
| M.)
|
| If the quoted _" $60 million deal"_[1] from Feb 2024 is
| accurate, that small amount from Google may not be enough for
| Reddit to turn a profit. It remains to be seen what the Q2 or
| Q3 financials will show.
|
| [1] https://www.google.com/search?q=google+ai+deal+reddit
| wuiheerfoj wrote:
| Wow, perhaps I'm naive but what the hell are they spending
| over $800M a year on? That seems an obscene amount for a
| glorified message board.
|
| I just read they have 2000 employees which is also puzzling
| to me
| toomuchtodo wrote:
| They were a public good currently larping as a for profit
| concern now run by a vanity and wealth driven executive
| driving it into the ground while it flails to monetize
| when that is likely incompatible with the entity.
|
| Compare and contrast to say, HN, run on two servers in a
| colo with less than a handful of mods.
| splwjs wrote:
| it's not just a message board, it's an influence machine.
|
| They need to make sure the stuff they want people to
| think is posted often and has a big number next to it,
| they need to make sure things that people like are
| associated with the stuff they want people to
| like/think/do and things that people don't like are never
| associated with the stuff they want people to
| like/think/do. They need to make sure that people who say
| the wrong things are silenced or persuaded to leave, etc
| etc. Man they probably have at least one contact in at
| least one intelligence agency and they have to make sure
| not to run afoul of that contact.
|
| Like the news isn't just a list of what happened
| recently, political debates aren't just two guys talking,
| and reddit/twitter aren't just message boards.
| alephxyz wrote:
| They spent 400M on R&D this quarter, which means more
| "personalisation"/ad targeting and probably cooking up
| some DOA chatbot/assistant product that's costing them a
| ton in compute
| some_random wrote:
| Almost 200 million is in CEO compensation
| https://www.statista.com/statistics/1453196/reddit-top-
| execu...
| Hikikomori wrote:
| The only things it does for me is forcing me to use Google as a
| large amount of the answers I need is on reddit.
| immibis wrote:
| That's what Google is paying them for :)
| brewdad wrote:
| So then this gambit worked. It sucks and I hate it. I will
| continue to use DDG/Bing first but it looks like I'll be
| hitting up Google more often too.
| WarOnPrivacy wrote:
| > The only things it does for me is forcing me to use Google
|
| Startpage, Kagi and Lukol are 3 that source from Google. I
| imagine there are others.
| stainablesteel wrote:
| which is ironic because pre-AI every solid piece of obscure
| information and non-programming question usually had an answer
| on reddit, its an extremely valuable dataset looking back. but
| moving forward i think its only going to become less valuable
| and people will probably manually/custom-scrape all the
| questions out of worthwhile subreddits and open up their data
| for free
| splwjs wrote:
| When I was young, my brother knew a guy who was really into
| movies. If you wanted to know about a movie you couldn't
| remember, you would go talk to that guy.
|
| For a while, the internet had an end-run play that made that
| guy less useful. You can just go on the internet for obscure
| movie information, buddy.
|
| But now it seems like knowing a movie guy is going to be the
| only way to get a real person's opinion on movies. The
| internet is about to forget everything without a profit
| motive and just start telling you that the latest product
| from a monolith corp like disney is the only movie worth
| watching. If someone scrapes all the useful movie opinions
| off of reddit and spends their time crafting it into a usable
| format, that guy's probably got a company. But not Bill.
| Bill's just a guy you can know or not know. You can't
| monetize knowing Bill. Sidenote that's probably why it irked
| me so bad when some bozo coined the phrase "social capital".
| splwjs wrote:
| If they kept their API open then by now the entirety of the
| site would be ai slop that was built with chatgpt and launched
| with the api.
|
| Then again most of what that site does is just blend and
| regurgitate the information that's currently on it anyway.
| miohtama wrote:
| Those AI bots would likely to be more intelligent commentors
| than Redditors
| abdullahkhalids wrote:
| The API changes and these robots.txt were part of the same
| strategy - preventing third parties from scrapping their data
| and reducing the AI generated content that makes it into their
| data. So they can sell that data and make money.
| kjkjadksj wrote:
| Their dataset is already polluted with misinformation
| campaigns and shilling
| AlexandrB wrote:
| > their data
|
| Love how it's their data when it might make them money but
| not their data if they get sued.
| abdullahkhalids wrote:
| That's fair. I agree with you that in some sense it is user
| data. And that Reddit is operating unethically.
| popcalc wrote:
| # Welcome to Reddit's robots.txt # Reddit believes in an
| open internet, but not the misuse of public content. # See
| https://support.reddithelp.com/hc/en-
| us/articles/26410290525844-Public-Content-Policy Reddit's Public
| Content Policy for access and use restrictions to Reddit content.
| # See https://www.reddit.com/r/reddit4researchers/ for details on
| how Reddit continues to support research and non-commercial use.
| # policy: https://support.reddithelp.com/hc/en-
| us/articles/26410290525844-Public-Content-Policy User-
| agent: * Disallow: /
|
| Source: https://www.reddit.com/robots.txt
| will0 wrote:
| Looks like it changed a month ago:
|
| https://old.reddit.com/r/redditdev/comments/1doc3pt/updating...
| immibis wrote:
| Nobody who wants to be successful obeys robots.txt. And I do
| mean nobody.
| chippiewill wrote:
| They changed it to disallow so that scrapers can't just claim
| the robots.txt gave them permission.
| toomuchtodo wrote:
| Independent scrapers can launder the data between Reddit
| and AI consumers. The only folks this hurts is users
| seeking info via search engines and folks willing to kowtow
| to rules that are potentially low impact to evade. Next
| steps would be (from an adversarial perspective) browser
| extensions that stream back data for ingestion similar to
| Recap for Pacer [1].
|
| [1] https://free.law/recap/faq
|
| (full disclosure: assisting someone pursuing regulatory
| action against reddit in the EU for a separate issue from
| scraping, it's a valuable resource, but the folks who own
| and control it are meh)
| tedivm wrote:
| According to the US court systems the robots.txt file is
| meaningless. If they respond with a 200 status code giving
| you the access then you can legally scrape it all you want.
| If they require that you log in then you have to follow the
| terms you agree to when creating an account. Public means
| public though, and if Reddit doesn't want to make the
| content private (put it behind a login) then we can scrape
| away.
|
| Note that scraping, regardless of the level of permission,
| doesn't mean you can do anything you want with the content.
| Copyright still applies. But you can scrape it, and if your
| use falls under Fair Use or another caveat to the copyright
| laws then you can do ahead and do it without needing any
| permission from the authors.
| JohnFen wrote:
| Sadly true. That's why I gave up on robots.txt years ago and
| started blocking crawlers outright in .htaccess
|
| Of course, that became unsustainable so now I have everything
| behind a login wall.
| sunaookami wrote:
| They serve a different robots.txt to Google:
| https://merj.com/blog/investigating-reddits-robots-txt-cloak...
|
| You can see it here: https://search.google.com/test/rich-
| results/result?id=_mYogl... (click on "View Tested Page")
| dogleash wrote:
| > # Reddit believes in an open internet, but not the misuse of
| public content.
|
| Calling it "public" content in the very act of exercising their
| ownership over it. The balls on whoever wrote that.
| pas wrote:
| it's even worse. it's not theirs (it's the users'), they are
| merely hosting it and using it (ToS gives them a fancy
| irrevocable license I guess).
|
| so they can do whatever they want with it and the actual
| owners/authors have no chance to really influence Reddit at
| all to make it crawlable. (the GDPR-like data takeout is
| nice, but ... completely useless in these cases where the
| value is in the composition and aggregation with other users'
| content.)
| deepfriedbits wrote:
| On top of that, a sizable chunk of Reddit content is ripped
| from elsewhere, whether videos, images, etc.
| visarga wrote:
| > the GDPR like data takeout is nice
|
| Is there a way to export my history? How?
| pas wrote:
| https://www.reddit.com/settings/data-request
|
| (and there's some help article for it that I didn't read,
| but google found this first
| https://support.reddithelp.com/hc/en-
| us/articles/36004304835... that's how I got to the link)
| throwaway290 wrote:
| actually owners/authors like me would not want our stuff
| crawlable because that gives up our ownership.
|
| When I am answering some random dude on reddit with a
| problem I want _that dude_ to read my solution. I don 't
| want this to be crawled and forever stored (probably
| deanonymized) or enshrined in a dozen commercial LLMs.
| There is substack for that stuff.
| raverbashing wrote:
| With the amount of crap in Reddit, cleaning it must be a very
| non-trivial problem. (I mean, it never is, but in the case of
| Reddit it's probably extra complicated)
| Elfener wrote:
| I mean, the reddit company did go public, so things like this
| were inevitable.
|
| Also things like the API fiasco, and also small annoyances like
| the fact that when you click on an image on reddit, it now goes
| to a wrapper html page instead of just the actual image (this was
| one of the reasons reddit was better than most social media...).
| mrec wrote:
| Maybe it's just me or something temporary (I use Old Reddit,
| like all right-thinking folk) but for the past couple of days
| the image wrapper page seems to have been sent to the glue
| factory. I'm just getting the image now, unadorned.
| lifestyleguru wrote:
| I deeply regret every minute spent on and kilobyte of text
| contributed to reddit.
| Ylpertnodi wrote:
| I don't. There's nothing around that is similar...with the same
| traction. The various 'verses are variations on cat pics. I'm
| still looking, though.
| wccrawford wrote:
| While it's still not Reddit, but I've been enjoying Lemmy. I
| have a similar range of communities on each, and other than
| some annoying groupthink, the content is often similar.
|
| And to me, forgetting to log in to each of them feels
| similar, too. For what that's worth. (I hate both of them
| when not logged in.)
| trallnag wrote:
| I can confidently state that I'm a net negative for Reddit,
| looking at the dozens of banned accounts in the trash bin of my
| KeePass vault
| card_zero wrote:
| I mostly contributed to r/nonsense and I'm pleased by the
| thought of that sub's content being used to train future AI,
| with information about the architectural uses of super-tall
| chef's hats, the prehistoric invasion of Europe by Beak People,
| and so forth.
| nerfbatplz wrote:
| I propose we change the term _enshitification_ to
| _engoogleification_ in regards to the internet.
| crazygringo wrote:
| This is about _Reddit_ disallowing other search engines.
|
| Blame Reddit, not Google.
| dvngnt_ wrote:
| plenty of blame to go around
| crazygringo wrote:
| You'll have to demonstrate that.
|
| Is Google's contract with Reddit exclusive, so that other
| search engines aren't given the opportunity to also pay?
|
| I highly doubt that, especially since the DOJ would go
| after that immediately because of antitrust.
|
| So no, pretty sure the blame here is 100% on Reddit unless
| you have evidence otherwise.
| dvngnt_ wrote:
| I don't think the DOJ acts immediately.
|
| > so that other search engines aren't given the
| opportunity to also pay?
|
| this makes it harder for new engines if google has
| exclusive deals with some of the most popular sites
| crazygringo wrote:
| > _if google has exclusive deals_
|
| My comment said, show me _that_ the Google deal with
| Reddit is exclusive.
|
| You haven't done that.
|
| And there's no reason to think it would be, because of
| antitrust. The DOJ doesn't have to act "immediately", the
| point is that obvious antitrust violations come with
| fines that make it unprofitable to attempt in the first
| place. And this would be black-and-white obvious
| antitrust violation, given Google's monopoly status in
| search. This isn't a gray area where it might be worth it
| for Google to roll the dice.
| dvngnt_ wrote:
| clearly some deal was reach between the two parties or we
| wouldn't be here.
|
| whether or not the deal is exclusive OR companies have to
| pay to index reddit it's still bad for competition. money
| has a barrier to entry preventing newcomers.
|
| I can blame reddit for creating the deal and I can blame
| google for accepting the deal if the effect is bing, ddg
| and others cannot display reddit results without reaching
| some deal.
| crazygringo wrote:
| I'm not saying it's not bad for competition.
|
| I'm saying the blame is 100% with Reddit.
|
| Blaming Google for accepting it makes no sense. That's
| like if a shopper goes to grocery store and buys an
| expensive $20 piece of cheese, and other shoppers can't
| afford cheese that pricey, and you're blaming that one
| shopper for buying it because it means other shoppers
| can't also get the cheese without paying for it. That
| doesn't make any sense. The _store_ set the price, and
| they 're the one to blame if other shoppers can't afford
| it.
|
| If Bing, DDG and others can't reach a deal with Reddit,
| that has _nothing_ to do with Google.
|
| Again, blame here is _100%_ on Reddit, and 0% on Google.
| To assign blame to a _purchaser_ in a case like this
| doesn 't make any sense.
| dvngnt_ wrote:
| bing probably has the money to reach a deal, smaller
| companies without monopolies is less likely, and that's
| the problem.
|
| i don't think google is blameless like you propose.
| crazygringo wrote:
| > _bing probably has the money to reach a deal, smaller
| companies without monopolies is less likely, and that 's
| the problem._
|
| Reddit can charge smaller companies less money. So if
| there's a problem, again, the problem is _100% with
| Reddit_.
|
| Google is absolutely blameless here. You may not like
| Google, and you can certainly blame them for plenty of
| other things. But in this situation, _literally all of
| the blame is with Reddit_ for deciding to remove their
| content from all search engines unless they pay. Reddit
| didn 't have to do that. Google didn't make them do that.
|
| Reddit did this. Not Google.
| dvngnt_ wrote:
| takes two tango. reddit can't do anything without google
| signing papers as well
| bryan_w wrote:
| I don't think you're right about that.
| frizlab wrote:
| I back up this proposal.
| debacle wrote:
| Reddit has been ripe for disruption for years. It's just waiting
| on an inflection point and someone to take it behind the barn.
| onlyrealcuzzo wrote:
| Or for Google to buy it.
|
| They could monetize it much better while being less annoying.
|
| Ultimately - Google is getting everything they want from Reddit
| with this deal without having to buy it outright.
|
| Short of Reddit transforming to an entirely different product
| (difficult) - I'm not sure where the major growth opportunity
| is for it.
| rob74 wrote:
| It wouldn't be the first time they have done something like
| this either. Remember
| https://en.wikipedia.org/wiki/Google_Groups ?
| Suppafly wrote:
| >Remember https://en.wikipedia.org/wiki/Google_Groups
|
| It'd be somewhat hilarious if google bought reddit just to
| archive it and shut it down.
| jessriedel wrote:
| Very few of the reddit users who are providing the content for
| free are motivated by which search engines are allowed to index
| the content, so I don't see how this would make it more ripe
| for competition. (If you just mean society would now be even
| better off if reddit were disrupted, ok, maybe, but that's a
| different thing.)
| crazygringo wrote:
| The network effects are too strong.
|
| Remember, the only reason Reddit "won" was because Digg
| destroyed itself with a radical upgrade that everyone hated.
|
| Reddit would have to do something similarly self-inflicted, and
| I can't even guess where people would go. Reddit was already an
| alternative to Digg -- what's the alternative to Reddit? I
| mean, it's certainly not Quora.
| tayo42 wrote:
| Reddit is quietly a huge website with a significant amount of
| users. So many people use it but dont talk about it. Google
| search says 1billion mau? Twice as big as Twitter
| nope1000 wrote:
| There is Lemmy for example, very similar to old Reddit. The
| big problem is the missing content outside of mainstream
| communities.
| NoMoreNicksLeft wrote:
| It was already dead by then. Really, it was the various
| Slashdot exoduses... sites like K5 got large initial boosts,
| but stumbled and started to deteriorate. If the Digg exodus
| is what sent you to Slashdot, chances are you're the kind of
| user everyone else was trying to escape.
|
| >what's the alternative to Reddit? I mean, it's certainly not
| Quora.
|
| If it was deliberate I certainly can't tell, but one of the
| characteristics of Reddit is that it caused so many other
| little tiny internet forums to just wither away. Most were
| visually unappealing, running some ancient phpbb software or
| whatever, but there were so many like stars in the night sky.
| Now, if they're even still running, you look for the newest
| post, and it will say "November 2023". Hell, the only reason
| they are still running is that the credit card number on file
| paying for hosting doesn't expire until next year somehow.
| Reddit is a red tide algae choking out all life in the ocean,
| nothing else gets to exist anymore.
| bobajeff wrote:
| I think you might be onto something with the observation
| about people moving from old forum software like phpbb to
| subreddits.
|
| It's like what happened to personal websites when things
| like Blogger, Tumblr and Facebook popped up.
|
| It's hard to beat something that is easy to set up and pays
| for hosting but still let's you control moderation. It's a
| no brainer.
|
| Managing your own domain where users post content is a
| minefield of problems these days even if you didn't mind
| the cost of running it.
|
| Something like this might also explain the move to things
| like Discord over IRC.
| Suppafly wrote:
| >Reddit was already an alternative to Digg -- what's the
| alternative to Reddit?
|
| This site is essentially 'orange reddit', they just need to
| add sub-HNs or tagging or something and it'd be ready for an
| influx of reddit refugees. Not that any of really want it,
| but it's possible.
| CSMastermind wrote:
| I don't think this is true.
|
| The main thing I see Reddit being useful for are discussions
| about entertainment.
|
| There's probably a subreddit for your favorite sports team,
| twitch steamer, TV show, book series video game, politics
| (which is entertainment for some people).
|
| Reddit has seriously degraded the experience of a lot of
| these communities with things like restricting custom CSS.
|
| It seems to me that the way you'd disrupt Reddit as a startup
| is to pick a vertical and laser focus on becoming the best
| discussion board for that community. If it's sports than have
| integrations for live stats, scores, etc.
|
| In general you could attract users by offering profit sharing
| on ads the same way Youtube does for creators.
|
| Have the best moderation tools in the world, a constant
| painpoint with Reddit. Give admins more flexibility over the
| appearance of the board, all things Reddit took away.
|
| The other path for disruption would be if an established
| company with those communities tackled the problem. Lots of
| communities already us Discord, but they tend to also have a
| subreddit because chat and forums are different communication
| methods. Discord could easily offer a forum product as an
| extension of their chat services. If they do it well they'd
| drive a lot of users away from the subreddits.
| api wrote:
| Networks effects are more powerful than we are. Witness the
| number of people who despise Xhitter but are still on there.
| Once something has a sufficient network effect they become
| immune to normal market forces and able to abuse their position
| with near impunity.
| bdw5204 wrote:
| The strange thing to me is how everybody keeps trying to make
| distributed Twitter happen when distributed Reddit is the low
| hanging fruit for federated social media.
|
| You don't want to end up banned from a movies forum because you
| also participate in a political forum. Federation solves that
| problem because you can use separate accounts without either
| forum knowing that you also use the other.
| ravetcofx wrote:
| This exists with Lemmy already and is fostering nice
| communities (and due to ActivityPub is interoperable with
| Mastodon accounts)
| ks2048 wrote:
| I like it principle, but after watching the situation with
| Twitter clones, I'm not too optimistic on federated services
| taking off.
|
| I would like to see a wikipedia-style system for
| Twitter/Reddit: open access data, non-profit.
| teabee wrote:
| Is this not just what the internet was before reddit? What
| features would "distributed reddit" have that an internet
| full of independent community forums be missing?
| psunavy03 wrote:
| They had this years ago, and they were called "forums."
| Suppafly wrote:
| >The strange thing to me is how everybody keeps trying to
| make distributed Twitter happen when distributed Reddit is
| the low hanging fruit for federated social media.
|
| Honestly, it's strange to me how hard people are trying to
| make distributed anything happen. Federation mostly solves a
| problem that real people don't have or care about.
| Yawrehto wrote:
| >Honestly, it's strange to me how hard people are trying to
| make distributed anything happen.
|
| IMO, something federation is very good at is solving one
| slow-moving problem - enshittification of social platforms.
| It's not immune, of course, but an Elon Musk-style takeover
| is much harder with Mastodon than Twitter, and it would be
| hard to run it into the ground in other ways because the
| platforms are owned by different people and groups.
| escapecharacter wrote:
| Man, I just want to be able to search the entire internet for
| when I'm doing niche research.
|
| Does this mean there will be a future where everyone is running
| their own crawler? I suppose.
| causal wrote:
| It feels like Reddit is approaching an inflection point anyway
| where bot-made content is concentrated enough to spoil the whole
| experience. Closed servers like Discord and Slack may be the last
| haven of online human interaction.
| onlyrealcuzzo wrote:
| This is an interesting development.
|
| How many other sites might have leverage to charge to be indexed?
|
| I don't want to live in a world where you have to use X search
| engine to get answers from Y site - but this seems like the
| beginning of that world.
|
| From an efficiency perspective - it's obviously better for
| websites to just lease their data to search engines then both
| sides paying tons of bandwidth and compute to get that data onto
| search engines.
|
| Realistically, there are only 2 search engines now.
|
| This seems very bad for Kagi - but possibly could lead the old,
| cool, hobbiest & un-monetized web being reinvented?
| ColinHayhurst wrote:
| Kagi uses at least Google and Mojeek
|
| edit:
|
| > Realistically, there are only 2 search engines now.
|
| https://seirdy.one/posts/2021/03/10/search-engines-with-own-...
| WarOnPrivacy wrote:
| > Realistically, there are only 2 search engines now.
|
| From the article: Many alternatives to GBY
| [Google, Bing, and Yandex] exist, but almost none of them
| have their own results;
|
| This seems to assert that ~0 other search providers do any
| crawling at all. Ever. Are we sure that's the case?
| (they could crawl but never ever return those results == more
| odd).
| ColinHayhurst wrote:
| It's a very long article so understandable that you did not
| read on and learn about other search engines crawling
| beyond GBY. Still there are indeed very few that are
| crawling at web scale, and internationally. We are at 8
| billion pages and totally independent [0], hence expressing
| our concerns to 404 media after being blanked by Reddit.
|
| [0] https://www.mojeek.com/about/why-mojeek
| WarOnPrivacy wrote:
| > did not read on and learn about other search engines
| crawling beyond GBY. Still there are indeed very few that
| are crawling at web scale, and internationally
|
| That's helpful clarification.
|
| In criticism of the article, you might agree that
|
| _none of them have their own results_
|
| is a fairly absolute statement. It signals: Final word on
| the matter; no nuance to follow.
| topaz0 wrote:
| Omitting the "almost" from "almost none" makes it sound
| disingenuously more absolute than it actually is.
| shadowgovt wrote:
| I mean, I didn't read on because it's paid.
|
| I'm not taking their reporting without compensation, but
| that also means I didn't have the whole story. Such is
| life in this era of the internet.
| smallerize wrote:
| It's not paid.
|
| _Sign up to support our work and for free access to this
| article_
| MichaelZuo wrote:
| Bing provides far fewer verbatim results for pretty much
| all search queries that I've tested.
|
| And Yandex isn't much better for non cyrillic search, Baidu
| is only for the Chinese web effectively.
|
| And all other search engines either don't even attempt to
| do full web crawls anymore and/or buy from one of the four
| above.
|
| So realistically there's just one search engine for the
| full web that actually does the work.
| WarOnPrivacy wrote:
| > And Yandex isn't much better for non cyrillic search,
|
| I like Yandex when I'm rabbit-holing after obscure
| musicians/music. I routinely have a better experience
| than I do with DDG or Kagi or Goog.
| MichaelZuo wrote:
| It's also vastly better for finding livejournal blogs.
| dev1ycan wrote:
| Brave has their own search engine, yandex I only use for
| reverse image search, baidu's interface is really clean
| and feels like old school google... but I don't speak
| chinese so I can't use it.
|
| I hope that one day they get a western version
| MichaelZuo wrote:
| Brave doesn't have its own index of the full web, and
| it's even less useful than Yandex. And very likely buys
| some of it, according to what I've heard. So it falls
| into the last category.
| em-bee wrote:
| if that is true then they are lying on their site where
| they claim: " _Brave Search operates from a fully
| independent search index_ "
|
| do you have any reference for your claim?
|
| i use brave search and find it very useful. very rarely
| there is something i can't find, and when i run into that
| other search engines are not much better.
| culi wrote:
| I believe Brave Search is also starting their own index.
| There are some tiny independent indexes too:
|
| https://www.crawlson.com/ https://search.marginalia.nu/
| https://wiby.me/ https://searchmysite.net/
| darreninthenet wrote:
| I believe Kagi has its own crawler as well and it merges
| all the results and does whatever Kagi does behind the
| scenes to show the mix
| Yawrehto wrote:
| Doesn't it list three major ones, Google, Bing, and Yandex,
| plus Mojeek and a few other small ones? That's a bit more
| than two.
| McDyver wrote:
| That seems like the business model for streaming. You subscribe
| to X provider to watch Y series. So, as for streaming, I
| suppose a pirate bay search engine will come up
| toomuchtodo wrote:
| Pirate Bay is probably not the most optimal analogy, more
| like Anna's Archive imho [1], individually offered by web
| property scrape runs compressed into a package, maybe served
| by torrents like this Academic Torrents site example [2].
|
| Scraper engine->validation/processing/cleanup->object
| storage->index + torrent serving is rough pipeline sketch.
|
| [1] https://hn.algolia.com/?dateRange=all&page=0&prefix=false
| &qu... ("HN Search: annas archive")
|
| [2] https://academictorrents.com/details/9c263fc85366c1ef8f5b
| b9d... ("AcademicTorrents: Reddit comments/submissions
| 2005-06 to 2023-12 [2.52TB]")
| splwjs wrote:
| idk man i bet you five bucks and a handshake it's just going to
| play out like the existing startup grift.
|
| There's an established player with institutional protections,
| then a scrappy upstart takes a bunch of VC money, converts it
| into runway, gives away the product for free, gradually
| replaces and becomes the standard, then puts out an s-1
| document saying "we don't make money and we never have, want to
| invest?" and then they start to enjoy all the institutional
| protections. Or they don't. Either way you pay yourself
| handsomely from the runway money so who cares.
|
| The upstart gets indexed and has an API, the established player
| doesn't.
|
| The upstart is more easily found and modular but the
| institutional player can refuse to be indexed to own their data
| and they can block their API to prevent ai slop from getting in
| and dominating their content.
| gtirloni wrote:
| _> but this seems like the beginning of that world._
|
| It's not the beginning, it's mere continuation.
|
| Walled gardens have existed since the AOL days. They
| deteriorate over time but it doesn't prevent companies from
| trying (each time, in bigger attempts).
| dvngnt_ wrote:
| site:reddit.com works for kagi for new posts this week?
| rozab wrote:
| Basically all 'independent' search engines piggyback off Google
| or Bing
|
| https://help.kagi.com/kagi/search-details/search-sources.htm...
|
| >Our search results also include anonymized API calls to all
| major search result providers worldwide
| ColinHayhurst wrote:
| >Basically all 'independent' search engines piggyback off
| Google or Bing
|
| Incorrect: https://www.mojeek.com/about/why-mojeek
| Suppafly wrote:
| weird, I've never heard of mojeek before and this is the
| 2nd comment in this thread I've seen mentioning it.
| zamadatix wrote:
| As of the time of writing there are 8 search matches in
| this thread: 1 from you, the rest from Colin (CEO of said
| company).
| Suppafly wrote:
| > the rest from Colin (CEO of said company)
|
| I assumed he had financial connection to them, but didn't
| want to take the time to research it. Mojeek is the new
| fetch.
| rozab wrote:
| Spam is fine by me if it's from the CEOs personal
| account, lol. Clearly I wasn't familiar with the product
| so it's a helpful comment for me
| em-bee wrote:
| also brave: https://brave.com/search/#independent
| AndroidKitKat wrote:
| Kagi gets part of their index from Google, per the article, so
| perhaps that's the reason Kagi still works. Wonder if Vlad and
| Kagi will do (or have done) the calculus to see if buying
| crawlability from Reddit itself is cheaper than buying results
| from Google for Reddit search.
| hugh_kagi wrote:
| Not yet but it's something we want to look into.
| ColinHayhurst wrote:
| Kagi pays to use APIs from Mojeek and Google
| karaterobot wrote:
| From the second paragraph of the article:
|
| > Searching for Reddit still works on Kagi, an independent,
| paid search engine that buys part of its search index from
| Google.
| dvngnt_ wrote:
| thanks i only read the first paragraph. then i went to kagi
| discord and they provided more context
| lpod wrote:
| Interesting move by Reddit to lock down their search
| functionality to just Google. I guess this means Bing and others
| are out of luck. Seems like another step towards the walled
| garden approach - good for ad revenue, but probably not great for
| user choice. Wonder how long it'll be before other platforms
| follow suit?
| jedberg wrote:
| They changed robots.txt a month or so ago. For the first 19 years
| of life, reddit had a very permissive robots.txt. We allowed all
| by default and then only restricted certain poorly behaved agents
| (and Bender's Shiny Metal Ass(tm))
|
| But I can understand why they made the change they did. The data
| was being abused.
|
| My guess is that this was an oversight -- that they will do an
| audit and reopen it for search engines after those engines agree
| not to use the data for training, because let's face it, reddit
| is a for profit business and they have to protect their income
| streams.
| JohnMakin wrote:
| One (in this case, 2) company's incentive for profit should not
| take priority over the usability/well being of the internet as
| a whole, ever, and is exactly why we are where we are now. This
| is an absolutely terrible precedent.
| jedberg wrote:
| I agree with you in theory, but in practice someone has to
| pay for all this magic.
| JohnMakin wrote:
| This is a false dichotomy. You can have services, and not
| have them devolve into complete unusability in the name of
| profit. This isn't sustainable either. The myopic pursuit
| of short term gains at the expense of the product will
| collapse at some point in the future, no matter how much
| you believe in this weird frog-boil internet we've
| inherited now.
| talldayo wrote:
| > The myopic pursuit of short term gains at the expense
| of the product will collapse at some point in the future,
|
| The myopic pursuit of short-term gains is the only
| playbook that works. Long-term business strategy is a
| gamble, and today's businesses have all learned that
| they'd rather make hay when the sun is shining than be
| remembered as a good business.
|
| Twitter tried a long-term playbook to reverse their
| unprofitable sinkhole of a website. That ended up with
| them being undervalued and sold to the highest bidder.
| twelve40 wrote:
| Complete unusability is when ai tools clone the content
| and people stop visiting the original service and
| participating. I'll leave it up to them to defend
| blocking duck duck go for example, but blocking "AI" bots
| for an online community is a matter of survival at this
| point.
| talldayo wrote:
| Alternatively, it's because the base platform has also
| devolved into unusability. Both Reddit and Twitter are in
| a position where their info is easily scraped, and their
| community is barely worth the advertising/paid-premium
| experience they demand from you. As both platforms
| continue to decline in quality, you might not even need
| to replace the original service. Both businesses appear
| intent on getting replaced.
| ToucanLoucan wrote:
| We did. As in we, the Internet, existed for a long time
| without anyone making money and we paid for the privilege.
| Websites were built and hosted at owner's expense, _for
| years,_ with no expectation that they be financially
| rewarded. Sure some would run donation drives, or work with
| sponsors relevant to the community in question, but a whole
| ton, mine included, just cost me a lot of money over many
| years.
|
| Those websites were definitely technically inferior, as the
| march of progress is unavoidable, but web hosting is
| cheaper than it's ever been. A VPS that utterly blows away
| what mine was capable of in 2007 for nearly a hundred a
| month can now be had for about $10 per month. Yet everyone
| wants these monolith platforms, but even that wouldn't be
| the worst thing ever, except that every one of these
| platforms has a backend to support that we in the Old
| Internet never did: a C-suite's worth of executives and
| millions of shareholders, who for some reason have decided
| that reddit can't exist unless reddit makes them reams and
| reams of money.
|
| I'd be very, very interested to see how much of, even
| what's probably the most massive one of all, Facebook, is
| non-essential busywork that could easily be shut down
| tomorrow with no adverse effects to the platform. Firstly
| the entire executive class, just, they don't do shit to
| make Facebook the product. In fact I'd argue their
| decisions almost universally have made it worse as a
| product very consistently for it's entire lifetime. Then,
| all the marketing people. There's just no goddamn reason to
| advertise Facebook (or reddit for that matter) the brand is
| so ubiquitous, if you actually found someone who'd never
| heard of it, I'd give you a large chunk of money. Add to
| that, if Facebook was doing a _good job_ of being what it
| ostensibly is, then people immediately become the best
| advertising, because people want to hang with people in
| these digital spaces. Then get rid of the people working to
| make Facebook addictive with dark patterns. Then get rid of
| the entire targeted ad division, because it 's gross and
| inhumane. Pare the company down to engineers who build the
| product, and if anything, _expand_ the moderation team so
| they can actually ensure the safety of the platform, and of
| course the IT staff to back them. Now what does Facebook
| cost to operate?
|
| As far as I'm concerned, this pearl-clutching about "well
| websites have to make money" is grossly, grossly
| overstated. Websites don't cost that much to run. A ton of
| money is being siphoned off by the MBA parasites playing
| games in Excel all day. A ton more is being wasted
| developing features that advertisers want and users hate. A
| ton more is being funneled into making products
| artificially addictive to vulnerable people, to exploit
| them, so let's just not do that. And of course, leadership,
| rewarding themselves with generous compensation packages
| they aren't even remotely able to justify. _Now_ what does
| your website cost to maintain? Surely not nothing, and for
| websites of substantial size, it will still be high, but I
| 'm willing to bet it's a hell, hell, hell of a lot less
| than it was before.
| kjkjadksj wrote:
| Part of the issue is that it isn't just the web, but the
| inevitable american corporate shareholder model. Even
| businesses could be mom and pop ified and made way more
| popular overnight: quit raising prices and cutting
| corners and it would actually stand for itself like a
| massive $7 burrito. However the expectation is that
| shareholders get returns. Costs must be cut. Prices must
| be raised. Margins must be improved. It doesn't matter if
| this eats the business alive, as shareholders are
| sufficiently leveraged. The whole system is incentivized
| to select for inferior quality and taking all the
| available money on the table.
| ToucanLoucan wrote:
| My rant above and your response reminded me of all those
| tons of MMO games out there that are ancient, with a tiny
| playerbase, that remain profitable nonetheless simply
| because if you have a product that people like using,
| putting it into maintenance mode and doing the bare
| minimum to keep it running is a perfectly valid business
| strategy. The companies that buy these service games and
| run them effectively just buy completed money printers
| and keep them operating. It's not going to make anyone
| rich probably, but it's a perfectly valid and profitable
| way to go about things.
|
| The silicon valley "grow at all costs, always evolve and
| innovate forever" model is so detached from the reality
| of most businesses in my experience.
| isoprophlex wrote:
| In biology, you'd call that a cancer. But to people
| praising the gospel of VC money, it's something
| desirable...
| Suppafly wrote:
| >The companies that buy these service games and run them
| effectively just buy completed money printers and keep
| them operating.
|
| I hadn't really thought about that topic in that way
| before. Really explains why some of those older MMOs have
| no desire to really make any improvements, the owners are
| happy to just keep them powered up and collect a check
| but have no incentive to invest in making them better.
| ToucanLoucan wrote:
| I think the notion that sometimes things are just "done"
| is incredibly undervalued in our industry. Frankly I wish
| a ton of games I play would STOP updating.
| Suppafly wrote:
| >I think the notion that sometimes things are just "done"
| is incredibly undervalued in our industry.
|
| I agree, but also the flip side is that things rapidly
| switch from 'done and working' to 'dead' pretty quickly
| if no one is willing to do minor maintenance.
| u8080 wrote:
| Yeah, like Rockstar with GTA V Online.
| lotsofpulp wrote:
| >Websites don't cost that much to run.
|
| Popular websites that allow user content to be uploaded
| or linked do cost that much to run, due to content
| moderation.
|
| There might be a small (relatively) forum here and there
| that a few good moderators are willing to slave away at
| keeping clean, but you will never see a website that
| allows user content with as many users as
| Reddit/Youtube/Instagram/etc be cheap.
|
| Although, due to AI, the cost to spam the small forums
| might be so small that even they might come into the
| crosshairs.
| megaman821 wrote:
| Although it is quite surprising that mainly text websites
| (Reddit, Twitter) are hard to run sustainably but video
| and image websites (YouTube, Instagram, TikTok) can
| because it is easier to sell ads against them.
| meiraleal wrote:
| how can we keep paying the ever-growing profits of multi-
| trillion dollar companies? This is insane.
| jsnell wrote:
| Reddit is 100x from being a trillion-dollar company, and
| is not profitable.
| meiraleal wrote:
| Reddit offers no magic is just a forum. Google used to do
| some magic decades ago and still profit from it.
| BeetleB wrote:
| I know people will hate to hear this, but Reddit it's not
| important to the A well being of the Internet.
| TeaBrain wrote:
| I think it's the other way around, in that people don't
| like to hear how Reddit has become important due to the
| death of independent forums and the degree to which
| information has become concentrated on the site.
| BeetleB wrote:
| The death of independent forums has been greatly
| exaggerated.
|
| Of all the forums I used to be active in, many are still
| active. The ones that died did so because the community
| died (i.e. they did not shift to Reddit and the like).
|
| Reddit is great simply because it allowed _anyone_ to
| create a community. No need to get a LAMP stack and deal
| with security vulnerabilities in your forum SW.
|
| These days you have Lemmy and its ilk. Much higher
| barrier than the old LAMP stack, but also much superior
| to it. I do hope it takes off.
| fredgrott wrote:
| the article quotes reddit policy change: Reddit considers
| search and ads commercial activities and thus subject to
| robot.txt block and exclusion.
| ColinHayhurst wrote:
| Person extensively quoted in the article here. They are welcome
| to reach out. But not a single person from any level did that,
| nor replied to my polite requests to explain and engage. We
| first contacted them in early June and by 13th June, I had
| escalated to Steve Huffman @spez.
| toomuchtodo wrote:
| An acquaintance investigating Reddit's moderation
| mechanization inquired how a major subreddit was moderated
| after an Associated Press post was auto removed by automod.
| They were banned from said sub. They inquired why they were
| banned, and they shared they would share any responses with a
| journalism org (to be transparent where any replies would be
| going, because they are going to a journalism org). They were
| muted by mods for 28 days and were "told off" in a very poor
| manner (per the screenshots I've seen) by the anonymous mod
| who replied to them. They were then banned from Reddit for 3
| days after an appeal for "harassment"; when they requested
| more info about what was considered harassment, they were
| ignored. Ergo, inquiring as to how the mods of a major sub
| are automodding non-biased journalism sources (the AP, in
| this case) without any transparency appears to be considered
| harassment by Reddit. The interaction was submitted to the
| FTC through their complaint system to contribute towards
| their existing antitrust investigation of Reddit.
|
| Shared because it is unlikely Reddit responds except when
| required by law, so I recommend engaging regulators (FTC, and
| DOJ at the bare minimum) and legislators (primarily those
| focused on Section 230 reforms) whenever possible with
| regards to this entity. They're the only folks worth
| escalating to, as Reddit's incentives are to gate content,
| keep ad buyers happy, and keep the user base in check while
| they struggle to break even, sharing as little information
| publicly as possible along the way [1] [2].
|
| [1]
| https://www.bloomberg.com/news/articles/2024-05-09/reddit-
| la... | https://archive.today/wQuKM
|
| [2] https://www.sec.gov/edgar/browse/?CIK=1713445
| ColinHayhurst wrote:
| The blocks for MojeekBot, as Cloudflare verified and respectful
| bot for 20 years, started before the robots.txt file changes.
| We first noticed in early June.
|
| We thought it was an oversight too at first. It usually is.
| Large publishers have blocked us when they have not considered
| the details, but then reinstated us when we got in touch and
| explained.
| Closi wrote:
| > But I can understand why they made the change they did. The
| data was being abused.
|
| Depends how you see it - if you see it as 'their' data (legally
| true) or if you see it as user content (how their users would
| likely see it).
|
| If you see it as 'user content', they are actually selling the
| data to be abused by one company, rather than stopping it being
| abused at all.
|
| From a commercial 'lets sell user data and make a profit'
| perspective I get it, although does seem short-sighted to
| decide to effectively de-list yourself from alternative search
| engines (guess they just got enough cash to make it worth their
| while).
| Ajedi32 wrote:
| > if you see it as 'their' data (legally true)
|
| Is that actually true? Reddit may indeed have a license to
| use that data (derived from their ToS), but I very much doubt
| they actually own the copyright to it. If I write a comment
| on Reddit, then copy-paste it somewhere else, can Reddit sue
| me for copyright infringement?
| jedberg wrote:
| They own a non-exclusive worldwide right to it. You own the
| copyright, they have a license to use it however they see
| fit.
| passwordoops wrote:
| Enough cash or enough data on hand to show the majority of
| traffic comes from the search monopoly
| ekidd wrote:
| I personally feel that this kind of "exclusive search only by
| Google deal" should result in an anti-trust case against
| Google. This is the kind of abuse of monopoly power that caused
| anti-trust laws to be passed in the 1890s.
| eddd-ddde wrote:
| if i create a vacuum cleaner and decide to only sell it at
| Walmart you can't get mad at me for not wanting to sell it at
| costco
|
| you can always buy a competitor's or make your own vacuum
| cleaner if you hate buying at Walmart
|
| maybe what you are really mad about is Reddit monopolising
| content
| ekidd wrote:
| Usually, to trigger any kind of anti-trust law, you need to
| have massive market share. In this case, for example,
| Reddit almost certainly hasn't committed any antitrust
| violations, because they're a relatively minor player in
| their market.
|
| Similarly, if you start a vacuum cleaner company, you can
| make whatever exclusive deals you want. But if you control
| 80% of the market for vacuum cleaners, then you might need
| to be more careful about leveraging your market share in
| unfair ways.
|
| If a company is part of a robust, competitive market (like
| Reddit), it's usually wiser to let customers vote with
| their wallets, and leave the government out of it. If a
| company becomes massively dominant (like Google or
| TicketMaster), and if it starts pushing exclusive
| contracts, it's much harder for customers to switch away.
| PaulRobinson wrote:
| This is great. It means I won't see Reddit content popping up all
| over search results in other engines. Can Medium do the same? And
| perhaps Quora?
| lfkdev wrote:
| Yeah awesome, reddit was one of the last useful results beside
| the spam blogs and ai generated articles.
| bdjsiqoocwk wrote:
| What a weird thing to say. Reddit has for a long time been a
| place where real people hang out and have real conversations,
| unlike quora and medium.
| MattPalmer1086 wrote:
| Its not strange to me. Every single time I've followed a
| Reddit link from search results, I've got a short and fairly
| useless conversation that doesn't help me at all. So I have
| never understood why people like it.
|
| Obviously, people do see value in it, or they wouldn't keep
| saying so! I would happily exclude Reddit links from search
| results though.
| candiddevmike wrote:
| I think Reddit lost that kind of authenticity a while ago.
| Advertisers know the "search:reddit.com <product>" trick, and
| when you look at the number of upvotes, it costs _pennies_ to
| get your product trending in the comments.
| Suppafly wrote:
| I don't search reddit for <product> though I search it for
| <highly technical issue with product> because reddit is the
| only place where real people discuss such issues and the
| solutions to them.
| VancouverMan wrote:
| > where real people hang out and have real conversations
|
| I don't consider the discussions there to be "real" in any
| meaningful way, thanks to the extensive moderation.
|
| From what I've seen, there typically ends up being a small
| handful of moderator-enforced narratives that are deemed
| "acceptable" for a given subreddit, and any commenters
| deviating from those narratives get banned, or their comments
| end up as "[removed]" by "[deleted]", or the comments get
| obscured with the "comment score below threshold" notice.
|
| It's generally some of the most one-sided and blandest
| discussion around. Given that there's often no meaningful
| back-and-forth involving differing perspectives of any sort,
| I'm not even sure if it should be considered "discussion".
| It's more like regurgitation and repetition.
|
| I've found the situation to be particularly bad on the
| Canadian locale-specific subreddits, for example, but a
| enough of the tech-oriented ones I've seen seem to end up
| like that, too.
| psunavy03 wrote:
| Yeah, but each sub to a greater or lesser degree, has its own
| hivemind you'll be run out of town (or possibly even banned)
| for challenging. And the average member of Reddit is quite
| willing to spout off confidently incorrect BS and downvote
| people into the ground who actually know what they're talking
| about.
|
| Not exactly always a reliable source of info outside
| uncontroversial niche topics or places like /r/AskHistorians
| that actually moderate. And even there I've seen the
| occasional humdinger.
| kingnothing wrote:
| What use do you get out of a search engine if not searching for
| reddit and other forums? The rest of the internet has become a
| cesspool of useless AI generated crap.
| kevincox wrote:
| To be fair Reddit threads are more and more often getting
| filled with useless AI generated crap as well.
| jjulius wrote:
| To be fair, Reddit has plenty of astroturfing, too.
| jonpurdy wrote:
| FYI, Kagi lets you do this and personalize it as you desire.
| They even share aggregated stats* about which domains users
| choose to block/lower. (Mine generally match these stats.)
|
| * - https://kagi.com/stats?stat=leaderboard&k=-2
| WarOnPrivacy wrote:
| > Kagi lets you do this and personalize it as you desire.
|
| Kagi shill here. Are they finally applying filters and
| operands to image searches?
|
| Asking because it was a tough year seeing Pinterest as top
| filter choice _and_ top result in images (when set as
| filter=block).
|
| (edit: I just tried searching->image: beautiful quilt
| patterns. I didn't spot any Pinterest results!)
|
| I have never understood why DDG, etc steadfastly refuse to
| obey operands in image searches. Most days. Every blue moon
| operands seem to work. I think.
|
| sidebar: Yesterday I saw Yandex obey quotes in a web search.
| It was the 1st time I've seen that.
| hugh_kagi wrote:
| > Are they finally applying filters and operands to image
| searches?
|
| That was a bug, apologies. It should be fixed now.
| troyvit wrote:
| Kagi lets you configure the search engine to deprioritize or
| even fully eliminate search results. They ride on the back of
| Google's indexing so -- if you ever change your mind -- you
| could bring reddit searches back.
| Suppafly wrote:
| >This is great. It means I won't see Reddit content popping up
| all over search results in other engines.
|
| Honestly, that makes those other engines way less valuable
| because for many topics, telling the engine to specifically
| narrow the results down to reddit comments is the only way to
| get a decent answer to what you're looking for. I'd definitely
| support blocking Quora from everything though.
| rkangel wrote:
| Interesting. I have long found Reddit to be the an excellent
| source of solutions to problems. Stack Overflow usually beats
| it for programming specific stuff, but for everything else
| usually the most helpful answer comes from Reddit. It's a real
| person, helping another real person with a real problem.
| nomilk wrote:
| Suppose a crawler or rival search engine doesn't respect
| robots.txt, reddit can't stop them. Make it a bit trickier, yes,
| but not stop them.
| eschneider wrote:
| It is evidence that they didn't have permission if you sue
| them.
| kingnothing wrote:
| There's no grounds on which to file suit. The 9th circuit
| court found web scraping is legal.
|
| https://techcrunch.com/2022/04/18/web-scraping-legal-court/
| tagawa wrote:
| This is not even scraping - it's just crawling and
| indexing.
| miyuru wrote:
| reddit blocked datacenter IPs even before this change.
| nomilk wrote:
| Could a motivated scraper not buy IPs/proxies that aren't in
| those ranges, i.e. to blend in with general users?
| xeromal wrote:
| Just like every security feature in the physical and
| digital worlds, security just inconveniences honest people
| and the cost to bypass reduces the amount of people who
| try.
|
| Eventually it becomes expensive to scrape reddit's data and
| most people will stop.
| Manuel_D wrote:
| Proxy IPs are also known and typically blocked. In fact,
| you can't even browse reddit without logging in when
| connected to most proxies.
|
| Many web scraping companies have loads of phones hooked up
| in a rack in order to use mobile IPs. Companies can't just
| block mobile IPs because their site would become unusable
| for several city blocks (mobile IPs often correspond to a
| specific cell tower). This is the face of modern web
| scraping: https://i.imgur.com/U2RXi5G.jpeg
| tempfile wrote:
| Hopefully this paves the way for antitrust action, but I won't
| hold my breath.
|
| Reddit's justification for this is profoundly wrong. Their
| "public content policy" is absurd doublespeak, and counter to
| everything the open internet is and hopes to be. You cannot
| simultaneously call yourself "open" and "public" while refusing
| access to automated clients. Every client is automated. They even
| go so far as to say that "crawling" (also known as "downloading")
| is an "abuse" and violates user privacy.
|
| This is absurd, and not justified. I would love to see
| legislation that restricted server operators' ability to prohibit
| automated access in this way, but I suppose it will never happen.
| Some people in this thread have attempted to justify the policy
| by saying "they have to protect their income streams". No they
| don't. You don't have a right to an income stream, and you
| certainly don't have a right to lie in order to get all the
| benefits of an open internet with none of the downsides. Noting
| of course that the "downsides" are in this case actually just
| "competitors".
| semiquaver wrote:
| Sorry, what is the antitrust concern about Reddit blocking
| crawlers that aren't paying them? Surely you don't think Reddit
| has a monopoly on anything?
|
| Or are you somehow suggesting that it's google's fault that
| Reddit took this step? I don't see any indication that's the
| case.
| em-bee wrote:
| not that reddit has a monopoly, but that google has.
|
| google is using their power to prevent others from competing.
|
| the problem here is of course that if reddit would be in
| financial trouble (i don't know if they are but let's imagine
| they need this money), they'd be between a rock and a hard
| place.
|
| google should not be allowed to make exclusive deals, and
| reddit could not survive without the deal, then what would be
| left? google buys reddit, or the relevant authority approves
| of the deal?
|
| i thought about the same problem with firefox. let's assume
| firefox is forced to allow people to make a choice of the
| default search engine (just like microsoft was forced to
| allow a choice of default browser on windows) then google
| might stop paying mozilla, and they could end up in financial
| trouble.
|
| ideally no company ever depends on a single other company
| that much, but that only works if we don't allow companies to
| grow that much in the first place.
| ColinHayhurst wrote:
| > let's assume firefox is forced to allow people to make a
| choice of the default search engine
|
| let assume apple is forced to allow people to make a choice
| of the default search engine in safari then google might
| stop paying apple, and ...
| tempfile wrote:
| surely firefox is the more interesting example, since
| they have orders of magnitude less alternative revenue?
| asadotzler wrote:
| > let's assume firefox is forced to allow people to make a
| choice of the default search engine
|
| Firefox has always allowed people to make a choice of the
| default search engine, since before it was even called
| Firefox. I know. I was there building it.
| em-bee wrote:
| yes, but the default is google, and you have to go into
| the settings to make a choice, so most people keep the
| default. what i meant was the EU directive for microsoft
| where they actually had to put up a prompt at first use
| asking the user which browser they want, without allowing
| any default (and, i am not sure, maybe even a randomized
| list)
|
| if the same was done for search engine choice for firefox
| then google would no longer be the default, and they
| would have no reason to pay firefox for that.
| tempfile wrote:
| Yes, sorry, should have been more clear: I claim google is in
| a monopoly position, not reddit. The rest of the comment is
| unrelated ranting about reddit's betrayal of their
| previously-held "public data is public" position.
| r_singh wrote:
| I wonder how Aaron Swartz would react to this
| geodel wrote:
| My guess is he'd freak out once he'd hear that lawyers, law
| enforcement may get involved on this issue.
| ykonstant wrote:
| It's ironic, because Reddit is the only search engine that works
| on Google now thanks to shittening.
| maxwell wrote:
| They're both running on fumes at this point.
| riiii wrote:
| Also sniffing them.
| voisin wrote:
| Makes sense that Google did this deal since their search quality
| tanked and they became an de facto front end UI for Reddit.
| NoMoreNicksLeft wrote:
| Up until 2016 (I think, +/- 1 year), if you could remember 3
| uncommon words in a comment, you could find any reddit post
| instantly on Google. I'd want to follow up on a thread from
| weeks ago, and it was magic. Number one result. Then one day
| that just stopped working, and even adding site:*.reddit.com
| didn't fix it. At the time, I think, I didn't realize that it
| was mostly Google's fault, I thought maybe Reddit had changed
| their infrastructure so that it couldn't be crawled properly.
|
| Google hasn't been a search engine in a long while, it's just
| an advertisement engine now.
| dev1ycan wrote:
| it's so bad it's crazy, you can legit not find stuff on the
| internet anymore, it's the same with youtube, I search
| something and get like 20 or so results and then everything
| else is hidden.
|
| it started when youtube removed the ability to search for
| videos older than 5 years, if I had to guess? cost saving,
| have every old video in cheaper storage... but it sort of
| fragments youtube, every couple of years you only get newer
| content.
| LegitShady wrote:
| "we noticed that since our search results had gotten so bad
| nobody can use them to find the things they want, people just
| kept adding "reddit" to search terms anyways, so we figured we
| might as well make it official and exclusive"
| lowbloodsugar wrote:
| Funny that source of TFA blocked me from reading the whole thing.
| roughly wrote:
| Boy, the LLMs have really been an apocalypse moment for the web,
| haven't they? Between hoovering up and monetizing every bit of
| content they can without any attribution or compensation and the
| absolute flood of mediocre generated content, they've really done
| in the last straggling remains of the open internet.
|
| It's not like everyone wasn't already pulling the same grift, but
| quantity really does have a quality all its own.
| imglorp wrote:
| Of course, we have to be careful not to villainize a neutral
| tech. Instead let's call it what it is: unchecked capitalism
| and monopolistic behaviors.
|
| Capitalism seems to work ok for the common good until you
| remove all the protections. LLMs provide a defacto monopoly for
| the owner which must already be a near monopoly: they take vast
| resources to train; only a giant corp can afford to buy all the
| content and provision enough resources to train one.
|
| LLM did not enshittify what's left of the internet, greed did
| it.
| mediumsmart wrote:
| that is awesome but I can't open old.reddit.com in my browser so
| its a non-issue.
| daft_pink wrote:
| I don't understand how this isn't anti-competitive behavior. It
| seems like reddit has to offer this deal with similar terms to
| google's competitors.
| talldayo wrote:
| They do offer that deal to others; a big news story was when
| OpenAI bought Reddit's data they were selling:
| https://openai.com/index/openai-and-reddit-partnership/
| dathinab wrote:
| yep, but for things which are "only" search engines it's not
| a viable offer. Only if you expect "big AI business value"
| from it does it make sense, maybe.
| Suppafly wrote:
| Most business deals are anti-competitive in some way. What
| makes you think this specifically rises to the level where
| they'd _legally_ have to offer similar terms to competitors?
| carlosjobim wrote:
| Why in the world would they have to do that? There are
| thousands of exclusive business-to-business deals being signed
| into action every second of the day.
| eddd-ddde wrote:
| I don't see how this tracks at all. Companies can decide to
| only sell their products with some retailer if they want. You
| can't force them to make deals with other companies.
| gtirloni wrote:
| You certainly can in monopoly situations (which apparently
| this isn't the case).
| mutatio wrote:
| It's funny in the context of Google's past motto of "don't be
| evil". I feel the right thing for Google here would have been to
| decline any deal regarding exclusivity, then Reddit wouldn't have
| pulled the trigger with its robots.txt update. The entire
| manoeuvre required both parties.
| peddling-brink wrote:
| Google should abandon its mission to "organize the world's
| information" because doing so requires spending money for
| valuable data, and others might not want to spend that money?
| tbeseda wrote:
| https://archive.li/GS2I0
| dathinab wrote:
| Worse it doesn't even really "work" anymore, giving how most
| search are flooded with garbage SEO results and payed
| advertisements "basically" looking like search results (most
| times more garbage not what you are looking for results, int he
| cases where it isn't it quite often times is on the line of
| "googles algorithm blackmailing companies to buy ads for users
| which want to find them through google but wouldn't without
| ads".)
|
| I wonder if this might affect redis, as in slowly kill it's user
| base especially when it comes to user providing (and often also
| looking for) high quality content, because who of such users
| would want to use google search?
| john-radio wrote:
| > Worse it doesn't even really "work" anymore, giving how most
| search are flooded with garbage SEO results and payed
| advertisements "basically" looking like search results ...
|
| I don't understand what you're saying. That's exactly why
| people append `site:reddit.com` to their searches in the first
| place, because those search results typically aren't like that.
| wwweston wrote:
| Or at least, reddit posts and comments that _are_ content
| messaging / marketing (human or AI) fit in better with
| earnest and natural posts, so that they're more effective.
| wtf242 wrote:
| This problem is only going to get worse. for my
| thegreatestbooks.org site i used to just get indexed/scraped by
| google and bing. now it's like 50+ AI bots scraping my entire
| site just so they can train a LLM to answer questions my site
| answers without having a user ever visit my site. I just checked
| cloudflare and in the past 24 hours I've had 1.2 million
| bot/automated requests
| sct202 wrote:
| There's a new setting in Cloudflare to block AI/scraper bots.
| https://blog.cloudflare.com/declaring-your-aindependence-blo...
| venkat223 wrote:
| google is selfish
| venkat223 wrote:
| Google is selfish
| StrauXX wrote:
| IANAL but as far as I understand the current legal status (in the
| US) a change in robots.txt or terms and conditions is not binding
| for web scrapers since the data is publicly accessible. Neither
| does displaying a banner "By using this site you accept our terms
| and conditions" change anything about that. The only thing that
| can make these kinds of terms binding is if the data is only
| accessible after proactively accepting terms. For instance by
| restricting the website until one has created an account.
| Linkedin lost a case against a startup scraping and indexing
| their data because of that a few years ago.
| jpalomaki wrote:
| Quite sure they are also enforcing these with some technical
| measures to limit scraping.
| renlo wrote:
| As was LinkedIn, who was forced to rate stop limiting / IP-
| banning scrapers for public pages.
| qingcharles wrote:
| At the federal level; but states have their own laws. For
| instance, it can get you 5 years in prison in Illinois to
| violate a web site ToS.
|
| https://www.ilga.gov/legislation/ilcs/ilcs4.asp?DocName=0720...
| redcobra762 wrote:
| Has anyone ever successfully been prosecuted for violating
| this statute?
| numbers wrote:
| "Information is power. But like all power, there are those who
| want to keep it for themselves. The world's entire scientific and
| cultural heritage, published over centuries in books and
| journals, is increasingly being digitized and locked up by a
| handful of private corporations." - Aaron Swartz (2008)
| Khelavaster wrote:
| robots.txt isn't legally binding. Can Reddit really force Bing
| not to crawl it..?
| melodyogonna wrote:
| Wait that's actually terrible.
| bitpush wrote:
| When Microsoft strikes an exclusive deal with OpenAI to use their
| models, it is a smart, brilliant, clever move.
|
| When Apple strikes an exclusive deal with suppliers for parts, it
| is sound business practice.
|
| When Google strikes an exclusive deal with Reddit, it is ..
|
| Some of you have no idea how businesses work, and it shows.
| riku_iki wrote:
| > When Google strikes an exclusive deal with Reddit, it is ..
|
| It's because reddit is selling content created by users, base
| on promises that reddit supports open internet, open data, etc,
| without their consent and sharing revenue, which maybe legal
| but likely not ethical.
| bitpush wrote:
| Let's get specific. You're confusing with copyright and
| licensing.
|
| The users hold the copyright (reddit claim that they made the
| meme) but reddit has the non-exclusive right to redistribute
| and license the content.
|
| Two different things.
| riku_iki wrote:
| > but reddit has the non-exclusive right to redistribute
| and license the content.
|
| that's what I said: it is legal.
| arnaudsm wrote:
| I understand the AI context, but this is dangerously
| anticompetitive for other search engines.
|
| This is a dangerous precedent for the internet. Business
| conglomerates have been controlling most of the web, but refusing
| basic interoperability is even worse.
| zooq_ai wrote:
| There is nothing preventing search companies paying the same
| $60 Million to license content.
|
| If reddit had exclusive agreement, it would be anti-competive.
|
| This is classic HN anti-Google tirade (and downvoting facts,
| logic and concepts of free market)
| pluc wrote:
| Paying 60 million to every site you want to index is also a
| bad precedent to set. Why can Reddit get paid and XYZ can't?
| zooq_ai wrote:
| Anyone can ask for licensing deal. I'm sure NY Times, Conde
| Nest all have licensing deals. Mr. Beast signed a deal with
| Amazon. Joe Rogan with Spotify. Why is it hard to
| understand?
|
| Even HN can get a licensing deal if they want to.
|
| If you are producing content, you have every right to do
| what you want to with the content.
| SlackingOff123 wrote:
| Reddit is not producing any content; its users are.
| zooq_ai wrote:
| Not the point. If users don't like it they can go
| somewhere else to post.
|
| For practical purposes, reddit can do whatever they want
| with users post. It's right there in TOS
| renewiltord wrote:
| Users sign a deal to give Reddit the content.
| spixy wrote:
| maybe Reddit has more value than XYZ?
| not_wyoming wrote:
| > There is nothing preventing search companies paying the
| same $60 Million to license content.
|
| Yes, actually, there is - having $60m to throw around.
|
| "Barriers to entry often cause or aid the existence of
| monopolies and oligopolies" [0]. Monopolies and oligopolies
| are definitionally the opposite of free market forces. This
| is quite literally Econ 101.
|
| [0] - https://en.wikipedia.org/wiki/Barriers_to_entry
| saghm wrote:
| Not to mention the fact that if this became commonplace,
| other websites might start charging as well
| zooq_ai wrote:
| Having a Monopoly != Anti-Competitive.
|
| Having Barriers to Entry != Anti-competitive
|
| Yes, large players have advantages of Economies of Scale.
|
| Just because you can't run an Airline because you don't
| have money to buy an Airplane isn't anti-competitive.
|
| Today Microsoft, Apple, OpenAI, Google, Amazon all can
| afford those piddly $60m to license from reddit.
|
| Not Anti-competitive at all.
|
| But saddened by how much corporate-hate by HNers destroys
| their credibility in debating these thing.
|
| Go ahead downvote
| not_wyoming wrote:
| If you check citations, you'd find the sentence preceding
| my excerpt on barriers to entry:
|
| "Because barriers to entry protect incumbent firms and
| restrict competition in a market, they can contribute to
| distortionary prices and are therefore most important
| when discussing antitrust policy."
|
| Antitrust policy then links to a page on competition law:
| "Competition law is the field of law that promotes or
| seeks to maintain market competition by regulating anti-
| competitive conduct by companies." [0]
|
| So yes, I'd downvote you if I could, but HN doesn't allow
| downvotes - which is honestly pretty fitting in the
| context of this conversation.
|
| [0] - https://en.wikipedia.org/wiki/Competition_law
| zooq_ai wrote:
| Once again buying an Airplane and starting an Airline
| business has probably the highest barrier to entry. Yet
| the Airline industry is the most competitive.
| not_wyoming wrote:
| The air travel industry has also seen some of the most
| significant government regulation in the form of blocking
| mergers (ie monopolistic, anticompetitive behavior) -
| meaning that competition in the airline space is due to
| regulation, not free market dynamics alone.
|
| I'm happy to continue this debate if you'd like to start
| supporting your posts with citations but probably won't
| engage further unless you do. Have a great day!
| try_the_bass wrote:
| And yet "free market forces" are often the reason why
| monopolies and oligopolies arise...?
|
| Monopolies are entirely consistent with free market
| economics. After all, if there's clearly a "best product"
| for a particular niche, it's entirely rational (free market
| actor) behavior for everyone to use the same product,
| leading to its monopoly in that market segment.
|
| I don't understand why people think this isn't/won't
| be/shouldn't be a common result of "free market forces".
| not_wyoming wrote:
| > Monopolies are entirely consistent with free market
| economics.
|
| This is a fair critique. I'm approaching this from an
| admittedly American perspective in which "free market"
| colloquially implies competition - but I recognize that
| competition is not inherently a free market concept.
|
| Good callout!
| GuB-42 wrote:
| Microsoft can throw around $60M, and Bing is used by most
| of the "alternative" search engines.
|
| It doesn't solve the problem, but if money is the only
| thing preventing search engines from accessing Reddit, then
| what goes for Google also goes for Microsoft.
| thih9 wrote:
| Story / rant warning.
|
| I remember seeing an unhelpful hyperlink for the first time. It
| was a random word in the body of a random tech site that
| redirected to a list of articles from that site tagged with that
| term.
|
| I remember being stunned, my expectation was that the link would
| lead me to another website, one that would be an authoritative
| source on that term and freely accessible.
|
| 20 years later we get a paywalled article about fragmented web -
| and we're not slowing down.
| lmeyerov wrote:
| FWIW, we inquired to the reddit sales team about paying for data
| sometime last year, as we do similar elsewhere for use cases like
| helping emergency responders, and even though they were launching
| the program and asking for customers... no email back. Nor on our
| second and I think third attempt.
|
| I'm not sure what to make of that.
| morkalork wrote:
| How much were you willing to pay? Still, rude of them not to
| even discuss the issue. Every time I've gone to buy data, if
| I'm too small of a fish, vendors have always been happy refer
| me to a reseller.
| lmeyerov wrote:
| We do 4-6 figures/yr for providers which is normal in our
| world
|
| An enterprise sales team with only 1 customer happens (eg,
| Mozilla 's search bar), but... That's surprising here, and
| scary as a sustainable & scalable business. Ignoring 5-6
| figure/yr inquiries says a lot to me. In contrast, we did
| that same-day with Twitter without talking to anyone.
| heisenbit wrote:
| Certainly rude but also possibly legally problematic. If they
| were judged to be in a dominant position in a market and were
| found making deals with exclusivity then it can get
| expensive.
|
| It all depends of course what the market is. If one looks as
| reddit not as a whole but as a collection of niches then one
| could imho find niches where reddit has a dominant knowledge
| position.
| jumploops wrote:
| IIRC, GPT-2 was primarily trained on Reddit[0]
|
| [0]https://www.reddit.com/r/ChatGPT/comments/133xgb5/gpt2_was_p..
| .
| neilv wrote:
| I'm concerned multiple ways by this, but I also could see some
| positive fallout from this, if it sets precedents that help
| protect 'content' owners from AI goldrush companies just taking
| everything.
| gtirloni wrote:
| AI companies are the least of our worries in the Reddit
| situation. The fact that Reddit has full control of user-
| generated data to do as they please gives them freedom to do as
| they please. I think this is the crux of today's issue.
|
| AI companies like Google, Microsoft and OpenAI have deep
| pockets to 'unprotect' themselves from anything. The barrier to
| entry is for small AI companies and those aren't really making
| an impact currently.
| r_singh wrote:
| Thinking from reddits perspective they have nothing to lose
| really. It's not like other search engines are going to pay any
| attention to the robots txt and Google's AI would have still
| scraped data from Reddit regardless of the deal. Now they will
| just feel less bad about not citing sources possibly, depending
| on the user experience they want to deliver.
| dbg31415 wrote:
| Every time I think, "How scummy..." Reddit always finds another
| way to go lower.
| earthboundkid wrote:
| They literally think the scissor statement is a real thing that
| will really work, fml.
___________________________________________________________________
(page generated 2024-07-24 23:09 UTC)