[HN Gopher] Searx - Privacy-respecting metasearch engine
___________________________________________________________________
Searx - Privacy-respecting metasearch engine
Author : cube00
Score : 243 points
Date : 2021-11-12 12:17 UTC (10 hours ago)
(HTM) web link (sagrista.info)
(TXT) w3m dump (sagrista.info)
| YXNjaGVyZWdlbgo wrote:
| I had a searx instance running for a long time and it's great
| when it works but the plugins for site specific searches break
| all the time and if you have more than 3-4 users with a high
| search frequency google blacklists your IP by throttling.
| kaba0 wrote:
| May I ask about what's your idea of preserving privacy with
| self-hosting a website that searches for specific terms from a
| presumably fixed IP, where by 1/3-1/4 chance it can be
| attributed to you?
| YXNjaGVyZWdlbgo wrote:
| The instance ran on a mobile connection not associated with
| any private information.
|
| EDIT: It was located in a german street light 20km away from
| any of the users. Just to get the geolocation question out of
| the way.
|
| It was more of an experiment than anything else there will be
| a talk about it and other FreiFunk (Open Mesh Network in
| Germany) related stuff at the next virtual CCC congress.
| dalf wrote:
| Did you renew the mobile IP from time to time?
|
| Note: since the version 1.0, searx stops sending request
| for 1 day when a CAPTCHA is detected which might help a
| little.
|
| (I'm really interested by the results of your experiment)
| YXNjaGVyZWdlbgo wrote:
| We renewed every 12 hours in the end we came to the
| conclusion that google might discriminate traffic from
| eastern european states. The sim cards were mostly from
| poland, belarus and the ukraine. When we tested it
| against german, french and italian cards captchas were
| way more frequent on the eastern cards and showed way
| earlier. I will ping you as soon as the talk is online.
| ahtaarra wrote:
| I recently made the swtich from DDG from Searx simply bevause
| right-clicking on a search result to copy the url resulted in a
| referrer link to be copied rather than the link of the result
| destination.
| southerntofu wrote:
| I think enabling DoNotTrack header or disabling JavaScript
| prevents this behavior: i cannot reproduce on Tor Browser. But
| you are correct this is a worrying development.
| caskstrength wrote:
| I use https://addons.mozilla.org/en-US/firefox/addon/clearurls/
| to automatically convert referrals to actual links on web
| pages.
| ravenstine wrote:
| I've been using ClearURLs for a while so I never even
| realized that DDG used referrer links. Have they always done
| this?
| boomboomsubban wrote:
| Isn't the referral link only on the ad results, which is
| clearly marked "AD"? That's what my quick test shows.
| ojosilva wrote:
| Are there any opensource real internet search engines worth
| looking at? I think we should be working on disrupting search as
| a whole instead of depending on the Googles, Baidus and Bings of
| the world.
|
| I'm fully aware of the massive crawling and storage requirements,
| but opensource projects that can get search right can later 1) be
| hosted by the powerhouses of the cloud or non-profit parties, or
| 2) become a fully distributed hosting and crawling effort as in
| p2p and blockchain.
| [deleted]
| vindarel wrote:
| p2p: there's the Yacy effort. https://yacy.net/ I... couldn't
| find a portal to try it out (I did years ago, the results
| need... to be discussed. It's anyways easy to install and to
| choose what part of the web to crawl.)
|
| > YaCy is free software for your own search engine.
|
| maybe they rebranded and don't aspire to be a complete web
| search engine?
| goldsteinq wrote:
| The problem with self-hosted search engine is that they make you
| _very_ unique: you're the only client of the "backend" engine
| with that (static and non-NATed) IP. Furthermore, you're now one
| of the small group of people with "hosting" IPs. Using self-
| hosted SearX may make you easier to track, not harder.
|
| Using SearX hosted by someone else is marginally better, but now
| you have to trust the owner of the server, which is probably not
| what you want for privacy-centered search engine.
| woodruffw wrote:
| Could you clarify where the privacy concern is here? As I
| understand it, I'm sharing my IP with search engines anyways;
| the only difference with a self-hosted SearX instance is that
| I'm sharing my server's IP instead.
|
| Is the concern that the latter's IP isn't behind a NAT, and
| therefore is more unique? If so, I think that's the least
| concerning of the identifying datapoints that a search engine
| has access to -- my browser metadata is far more identifying.
| With SearX, that information doesn't get forwarded (IIUC).
| marc_abonce wrote:
| If you don't want to expose your IP address, you can configure
| searx to proxy all the queries through Tor. This obviously
| makes the instance way slower and you'll have to disable some
| engines that block Tor exit nodes, so it's a trade-off.
| randomsilence wrote:
| When you click a link on your SearX instance and you don't use
| referrers, how can anybody track you? Nobody knows that you are
| coming from your "backend" engine.
|
| You just reveal your search queries to the hosting provider if
| he maliciously intercepts them.
| drcongo wrote:
| As someone who uses Safari with its built in list of search
| providers, I'm rather stuck with DDG for address bar searches,
| but boy it has really started to suck over the last year or two.
| boomboomsubban wrote:
| You could always do !searx
| drcongo wrote:
| True! I'm experimenting with Ecosia right now, but also
| recently got on the beta for the kagi.com search engine which
| so far has proven to be vasty superior to any other that I've
| tried.
| bombadilo wrote:
| So don't use Safari then? Seems like a pretty simple solution.
| sixothree wrote:
| Edge is available for Mac.
| drcongo wrote:
| Chrome destroys the battery on my laptop and is basically
| spyware these days, the Chrome-alikes are all dreadful -
| Vivaldi has the jankiest UI ever, Brave is unbearably
| sluggish, Edge runs processes that I didn't ask for like
| Microsoft Updater that bugs me constantly and spams the new
| tab screen with all sorts of low rent junk that I can't
| remove.
|
| Firefox is my developing browser and I do really like it, but
| Safari my actual browsing browser because it's by far the
| best browsing browser on Macs.
| xanaxagoras wrote:
| I've landed on Librewolf for personal, ungoogled chromium
| for work. It's great so far, been on this setup for a few
| months.
| boogies wrote:
| If you stop using macOS you can get much better frontends
| for WebKit, from the simple, rather Safari-like GNOME Web
| (AKA Epiphany) to the powerful Pentadactyl-like luakit.
| ColinHayhurst wrote:
| Yes, Safari is the worst offender when it comes to offering
| search choice on desktop.
|
| On iOS using a new app called Hyperweb you can the new Safari
| extensions to access and create a longer preferences list.
| https://hyperweb.app/
|
| We really shouldn't have to choose this or that, but should be
| able to easily use multiple choices in search. You can do that
| today as explained here, but you'll need to switch browsers.
| https://blog.mojeek.com/2021/09/multiple-choice-in-search.ht...
| jeroenhd wrote:
| > with its built in list of search providers
|
| TIL. That's just... terrible UX.
| jqpabc123 wrote:
| _You do not need to trust third parties to keep you private and
| not track your every move, which is awesome._
|
| The only way to avoid third parties to run your own server ...
| but this "metasearch engine" is basically just an aggregation
| proxy. So every search can still be tracked back to your proxy
| server by Google, Bing or whoever is providing the actual
| results.
| ricardo81 wrote:
| 'Uses Amazon Web Services (AWS) as a cloud provider and Cloudfare
| CDS.'
|
| IIRC DDG uses Microsoft servers now exclusively. Makes sense
| given the volume of queries they're handling and all dependent on
| Bing API.
| AmosLightnin wrote:
| Privacy is important, but I also care about the terrible quality
| of search results I get from nearly all the major providers these
| days. Couldn't an aggregator like SearX host a machine learning
| layer that learns what results are more likely to be valuable to
| me, and ranks them higher in the results? Keeping the
| customization layer on my own server and improving search results
| would seem to be a big advantage both privacy and performance
| wise.
| analyte123 wrote:
| If you're trying to be comprehensive, a few other suggestions in
| rough order of their usefulness:
|
| Gigablast.com - Has been improved recently. private.sh is
| supposed to be a private proxy for Gigablast, but it has been
| broken recently
|
| Exalead.com - run by a French defense contractor for some reason
|
| filepursuit.com - search for files only. Need to play around with
| it more.
|
| PeteyVid.com - multi-platform video search
|
| Wiby.me - focus on "classic" style web sites
| jefc1111 wrote:
| I have been trialling Swisscows (https://swisscows.com/) and have
| found it quite useful. I have not deeply researched their privacy
| claims, but for now I am just trying to not use Google or
| mainstream alternatives.
|
| Does anyone else have experience or comments on Swisscows' search
| engine? Seems like an interesting company all round.
| hans_castorp wrote:
| Qwant is another (non-US) alternative
| rahen wrote:
| Doesn't it use Bing results, like DuckDuckGo does? I read it's
| hosted on MS Azure like Bing and DDG, so in the end it's
| somewhat just a rebranded Bing. Quite a shame for a European
| (Franco-German) search engine.
| kaba0 wrote:
| Even if they were just a proxy behind bing, that would be
| still good, weren't it?
| rahen wrote:
| Well, claiming to offer an alternative to the GAFAM while
| depending on their products / data / infrastructure is a
| bit misleading.
|
| So indeed they're doing okay privacy wise, but a lot of
| users feel cheated when they realize their "independent
| search engine" (DuckDuckGo) is just a Bing portal hosted on
| Azure.
| citizenpaul wrote:
| Slightly off topic but has anyone a good solution for removing
| content farm seach results in this or any engine. For example
| some worst offenders wikihow , forbes, business i sider.
| ChrisArchitect wrote:
| Please keep title same as the actual post: Searx - moving away
| from DuckDuckGo
|
| It gives more context to the topic, as in it's not just a link to
| the search engine itself.
| hermitsings wrote:
| I had been considering using Searx (which I had known about
| before) lately since I have to use DDG+Google for getting
| satisfactory results.
|
| Edit: I really like DDG bang and vim like nav keys tho
| visiblink wrote:
| Someone here once posted a link to duckduckstart.com. I have it
| set up in my search bar now.
|
| If you search, it goes through Startpage (Google results, more
| privately). If you search with a bang, it goes through
| Duckduckgo. It's probably close to what you're looking for.
| mg wrote:
| Every time I stumble across a new search engine, I add it to my
| search engine comparison tool:
|
| https://www.gnod.com/search/
|
| Will add SearX now. It seems to provide reasonably good results.
|
| Update: It's on (under 'More Engines').
| [deleted]
| Minor49er wrote:
| You should add Fireball. It's excellent
|
| https://fireball.de/
| nix23 wrote:
| A yacy instance would be good too ;)
|
| https://yacy.net/
| danskeren wrote:
| Ask.Moe
|
| nona.de
| 51stpage wrote:
| There is also Marginalia https://search.marginalia.nu/ which I
| don't see on your list.
| marginalia_nu wrote:
| It's currently undergoing its monthly maintenance just FYI.
| It's up and technically working, but with a drastically
| reduced index size.
| sysadm1n wrote:
| Tried Marginalia. So many plaintext http links which I avoid
| like the plague. That's my only gripe with it. Other than
| that, it's an awesome tool.
| forgotmypw17 wrote:
| That's funny, I personally prefer HTTP for its simplicity,
| human-readability, accessibility, lack of centralized
| control, backwards compatibility and lack of forced
| upgrades or locking out old clients, etc., not to mention
| speed.
|
| Of course, I'm fortunate enough to live in a place where
| MITM attacks are virtually non-existent, aside from WiFi
| portals and maybe ISP banners (which I've never
| experienced.)
| freediver wrote:
| > So many plaintext http links which I avoid like the
| plague.
|
| Why? What you described appears to be the safest place on
| the web.
| sysadm1n wrote:
| I only browse HTTPS sites. I have the `HTTPS Everywhere`
| addon installed with the `EASE` / Encrypt All Sites
| Eligible turned on so I don't accidentally browse an
| unencrypted website. Something like 85%/90% of the web is
| encrypted now, and there's _no_ excuse to be using
| outdated plaintext http anymore. It 's a privacy and
| security risk. There are only few instances where I had
| to view a http site (I'm a freelancer and my client's
| webpage was still unencrypted, so I had to see it, so a
| rare exception to the rule).
| marginalia_nu wrote:
| It's funny because I got like 70% HTTP in my index, so
| the whole "90% of the web is encrypted" seems to depend
| on which sample you are looking at. Google doesn't index
| HTTP at all, so that's not a good place to go looking for
| what's the most popular. That's in fact half the reason
| why I built this search engine in the first place,
| because they demand things of websites that some websites
| simply can't or wont comply with.
|
| A lot of servers still use HTTP, for various reasons.
| There are also some clients that can't use HTTPS.
| stjohnswarts wrote:
| I think there are absolute numbers and then there are
| "the sites most people visit regularly" and those
| probably are 75% https. It's relative like most things.
| marginalia_nu wrote:
| Absolute numbers are pretty hard to define, as is the
| size of the Internet.
|
| If the same server has two domains associated with it,
| does it count twice? Now consider a loadbalancer that
| points to virtual servers on the same machine. How about
| subdomains?
| stjohnswarts wrote:
| It may be a privacy risk, but it's certainly not a
| security risk with plain old blog and static sites that
| have completely open data available to anyone who wants
| to surf to their sites.
| freediver wrote:
| The privacy and security risk comes in large part from
| the nature of code and actions performed on the site.
|
| In reality as far as privacy goes, the matters are on
| average opposite to your claim. Most sites that will put
| your privacy at risk today are using https - I am talking
| about the vast majority of the commercially operated web
| today. I know my privacy is much better respected on a
| plain text (no javascript) site using http then on
| [insert a top 10k most popular site here] using https.
|
| And for security, if I am not performing for example
| shopping or entering my billing details anywhere on the
| site, I do not see how a http site can compromise my
| security.
|
| I actually prefer deploying http sites for simple test
| projects where speed is imperative because they are also
| faster - there is no SSL handshake needed to connect.
| marginalia_nu wrote:
| You should be getting fewer .txt-results in the new update,
| a part of the problem was that keyword extraction for plain
| text was kind of not working as intended, so they'd usually
| crop up as false positives toward the end of any search
| page. I'm hoping that will work better once the upgrade is
| finished.
| mattowen_uk wrote:
| I have this REALLY old text file of search engine URLs:
|
| http://www.jaruzel.com/textfiles/Old%20Web%20Info/Internet%2...
|
| Google basically killed almost off of them off :(
|
| It would be great to see some proper competition in the search
| space, especially around specialist search engines.
| sixothree wrote:
| I would love a search engine targeted towards developers.
| Searching for symbols seem to be a problem with google, not
| to mention all of the utterly crappy results they serve up.
| ijr wrote:
| Symbolhound does that.
| stagas wrote:
| Also grep.app for searching into repos really fast.
| jwithington wrote:
| where's you.com lol
| mg wrote:
| Thanks, added.
| 1_player wrote:
| Is anything supposed to happen when I enter something and press
| Enter? Nothing happens for me, FF on Windows, uBlock.
| reayn wrote:
| You are supposed to type something into the search field then
| click the engine you want to use, it will pass on whatever
| you entered.
|
| I agree that the creator could make that a little more clear
| somewhere on the page.
| ColinHayhurst wrote:
| Nice work. If you have a twitter handle you might request to
| get added to these lists; either way they might be useful for
| you: https://twitter.com/SearchEngineMap/lists
| imglorp wrote:
| This is a nice compilation.
|
| It would be very interesting if it examined and compared
| results.
| m-i-l wrote:
| Yes, looks good, although I thought it was going to be a
| federated search, i.e. you enter your search term and it
| performs that search on all the sites selected. The simpler
| way of implementing a federated search would be to show
| separate results boxes from each site, although that wouldn't
| scale well to a large number of sites, and it can get quite
| complicated to try to combine the results.
| wenbin wrote:
| You should add the podcast search engine
| https://www.listennotes.com/
| autoexec wrote:
| Too bad they force you to log in to view a result or do
| anything but search. They also share/sell your data to 3rd
| parties including Google
| mg wrote:
| Thanks, added.
|
| Holy Moly, are there really over 100 million podcast episodes
| out there?
| wenbin wrote:
| Yes. some numbers: https://www.listennotes.com/podcast-
| stats/
|
| Listen Notes was started in early 2017 as a side project,
| when there were ~23 million episodes.
|
| I remembered seeing the number of web pages indexed by
| Google in early 1998 was ~25 million, then I thought that
| "ok, 23 million episodes might justify the existence of a
| podcast search engine" :)
| fsflover wrote:
| You should also add YaCy: https://yacy.net.
| mg wrote:
| It seems to be not web based?
|
| When I click on "Try out the YaCy Demo Peer", I get "502 Bad
| Gateway".
| fsflover wrote:
| It's self-hosted and peer-to-peer. You could search for
| other public-facing instances, e.g.,
| http://sokrates.homeunix.net:6060. Ideally, you could run
| your own instance to show the world how it works.
| xwdv wrote:
| People don't want privacy. They want results.
|
| Society programs us to think privacy is our top concern. Is it?
| marginalia_nu wrote:
| I do think you are mostly correct. Some people really care
| about privacy, but for most people it isn't a huge concern.
|
| This doesn't make it any less important, but just means that if
| your main selling point is "we're the search engine that cares
| about privacy", then odds are you're not going to get a lot of
| users.
| lardolan wrote:
| I agree with most of your points. Although It may be matter
| of a time to grow that niche. IMO Pople who value it are
| willing to take sacrifice of less functional outputs.
|
| Privacy is most effective selling point when working with
| sensitive information.
| marginalia_nu wrote:
| If you are working with sensitive information, the last
| thing you probably want to do is broadcast that you are
| working with sensitive information.
| kreeben wrote:
| Au contraire, it's the first thing you should broadcast,
| unless you're trying to scam people out of their PII.
| [deleted]
| GhettoComputers wrote:
| Isn't hosting your own instance taking away every benefit of
| searx by revealing your IP? If you had a VPN you'd use it to mask
| your IP from tracking of search engines anyway, and if you used
| Tor for it, you'd probably move back soon since it'll be so much
| worst with latency, like how many people go back to google
| because DDG sucks for results. I suggest just using instances you
| can find them at https://searx.space some are more reliable than
| others but none have been trouble free. There's a lot of these
| instances for privacy, chrome has a privacy plugin with a white
| eye that uses nitter.net for Twitter, teddit for Reddit and other
| public instances. One instance of Reddit was even made completely
| in rust. ;)
|
| https://chrome.google.com/webstore/detail/privacy-redirect/p...
|
| To reply to the person under me, you're always relying on a trust
| in something unverified and untrustworthy filters like VPNs
| anyway, you're either revealing your IP using a wrapper that
| reveals it instantly, use a site that isn't a search engine and
| might be using your data, using a VPN that is based on reputable
| and assumptions, usually based in another country you won't visit
| or know much about aside from random reviewers, or using Tor,
| losing latency, reasonable speed image search, and still be
| possibly compromised.
| kleinsch wrote:
| But if you're using instances someone else is hosting, aren't
| you hitting half the author's objections?
|
| - They may be hosted in the US
|
| - They may be hosted on AWS
|
| - You have no idea if the maintainer of the instance is
| tracking you
| jarvuschris wrote:
| ^ This right here. This article is pretty hogwash IMO
|
| Points 1 and 3 aren't relevant if they aren't recording the
| data. Companies in other jurisdictions have no magic
| invulnerability you can trust to their data getting out
| (legally or illegally) if they're storing it.
|
| Points 2 and 5 are equally true of any open source project
| unless you run it yourself from source. There are _plenty_ of
| examples of users getting phished by maliciously built/hosted
| open source tools
|
| Point 4 is obviously not malicious tracking and a mistake any
| project could make
|
| At the end of the day though, unless you're going to run
| everything yourself (which most people aren't) you have to
| pick who to trust -- some random person running a server
| somewhere, or a company with hundreds of employees recruited
| under the premise of working on a privacy-centric search
| engine who could all turn whistleblower
| jostillmanns wrote:
| Whoogle is another alternative, that focuses on Google search
| results
| ced wrote:
| From the link:
|
| _The CEO sold his previous company 's data before founding DDG.
| His previous company (Names DB) was a surveillance capitalist
| service designed to coerce naive users to submit sensitive
| information about their friends. _
|
| Is that a fair statement? Can someone provide more context?
| boomboomsubban wrote:
| It was a failed social network to help you reconnect with old
| friends. It tried to get you to recruit your friends
| immediately after registering and had a typical social network
| license. I'd say that description is intentionally describing
| it in the worst possible light, but not wholly inaccurate.
| deltree7 wrote:
| Yet, HN is down and ready to suck searx dick because hurr
| durr privacy. This is HN in a nutshell.
|
| Hint: If all you privacy-paranoid people show the exact same
| behavior, you are an advertiser dream. Sure, you won't be
| sold shampoo like general population, most advertisement
| companies know the exact kind of doomer-prepper items to sell
| if you come via VPN/DDG/whatever convoluted hack/concoction
| you come up to make your life inconvenient.
| freediver wrote:
| I always liked the term "search engine client" better (vs
| 'metasearch engine'). In essence it is a product that can connect
| to different search indexes.
|
| An "email client" does exactly the same thing, connects to
| different email servers and we do not call it "metaemail".
|
| edit: just realized that with the current hype around metaverse,
| 'metasearch' will probably be more appropriate for something
| searching the metaverse in the future.
| phantom_oracle wrote:
| I will add add a disclaimer to this comment that it is tinfoil-
| hat and just speculation(bordering on conspiracy) but many of
| these "we are a privacy-first company" might actually just be
| honeypots and fronts for 3-letter agencies.
|
| The comment is not wholly conspiratorial, considering the CIA
| owned Swiss crypto company: Crypto AG [1]
|
| It's within the realm of possibility that most of these privacy
| services could be owned by 3-letter agencies or small enough to
| be coerced into cooperation.
|
| [1]
| https://www.scmp.com/news/world/europe/article/3050193/crypt...
| marginalia_nu wrote:
| I do think it's a bit of a red flag.
|
| Sort of like how most anti-tracking browser extensions
| eventually turn out to actually be tracking extensions. Or like
| how used car dealers that have a name like "honest bob's cheap
| luxury cars" often turn out to neither be honest, cheap nor
| luxurious.
| GhettoComputers wrote:
| Isn't that confirmation bias? uMatrix and uBlock are
| reliable, the opposite being PrivacyBadger. The EFF has lost
| my trust before but I never assume maliciousness before
| incompetence. https://old.reddit.com/r/privacytoolsIO/comment
| s/l2dges/why_...
| marginalia_nu wrote:
| The list of browser extensions that in some form has
| backpedaled from their central premise and main function,
| the list is pretty long. Ghostery, Adblock, AdblockPlus,
| ...
| GhettoComputers wrote:
| I don't disagree it's a lot, NoScript was another
| example, uBlock and uMatrix by no fault of themselves
| were also hijacked being open source, Ghostery was sold,
| and Adblock Plus with acceptable ads wasn't bad as they
| said. It was widely reported, I continued installing ABP,
| since it was easy, wasn't hard to turn off acceptable
| ads, and I think that direction they tried to move the
| industry in wasn't harmful. I might have moved back to
| Adblock or learned about hosts but if they were
| successful we'd have less resource hungry ads, a net
| benefit for everyone, especially when using public
| computers or helping someone with IT.
|
| Ghostery was more widely reported as Audacity adding
| telemetry. Everyone who cared knew long before to leave
| or uninstall it.
|
| Hosts blocking is reliable and I've never had a single
| malicious one with the wide assortment I used. PiHole
| hasn't been hijacked either and I think it's unreasonable
| to think that no group can make mistakes, faltering can't
| ever happen, I really don't think Adblock Plus was that
| bad.
|
| If the market wasn't saturated with methods to block, I
| would have stuck with them if they were remorseful.
|
| -Sent from my not private Apple device I'll still use
| since it's got a huge userbase on messaging in the US
| jqpabc123 wrote:
| Haven't you heard? The CIA has gone open source. They don't
| need to own a company anymore.
|
| They can just download the Searx source code; modify it as they
| see fit, and make it available on a server someplace.
|
| Can you prove that searx.be isn't run by a "3 letter agency"?
| Can you prove that the source code running at searx.be is the
| same as on Github?
|
| The point being --- unless you have full access to the server,
| open source means nothing with regard to privacy and security
| of any service. It actually means less than nothing --- it
| means it is super easy to build into a honeypot.
| marc_abonce wrote:
| Of course, there's no fool-proof solution to knowing what
| code is running in the server side, but https://searx.space
| at least shows if an instance modified their client-side
| code, which you can see in the HTMl column.
|
| To mitigate server-side code from identifying you, you can
| consume an instance from Tor. Of course, you could try to do
| that with any other search engine, but most of the other
| search engines either block exit nodes or provide incomplete
| functionality if you disable JS.
|
| It's not perfect, but it may be good enough depending on your
| threat model.
| jqpabc123 wrote:
| Note to the CIA --- don't modify the client side code when
| building your honeypots.
|
| Personally, I just use a VPN with the "lite" version of
| DuckDuckGo --- no JS.
|
| https://lite.duckduckgo.com/lite
| ColinHayhurst wrote:
| SearX is a project which we respect and a positive contribution
| to improving search choice. Consideration of how it might be
| being used is wise.
|
| It's also wise to do due diligence on any company/service where
| you are revealing sensitive personal information. Traffic
| coming from Google in 2006, for sensitive medical search
| queries was a catalysts for us going public in 2006 on our
| strict no-tracking policy and we maintained that position.
|
| We have yet to be contacted by authorities, but you'll have to
| trust us on that one for now. Since we don't log any personal
| or identifying data at all, we would have nothing to share [0].
| You can read about our investors on our blog.
|
| Building and maintaining a search engine with independent
| infrastructure has a huge challenge and has meant building
| proprietary IP over many years. Since we refuse to use
| techniques used in growth hacking such as analytics from you
| know who, and all tools involving any tracking, marketing is a
| bigger challenge than it is for companies without strong
| principles. It has been a mammoth effort, by mostly our founder
| whose story you can read here [1].
|
| [0] https://www.mojeek.com/about/privacy/ [1]
| https://blog.mojeek.com/2021/03/to-track-or-not-to-track.htm...
| sleepysysadmin wrote:
| The thing is... lets say the CIA/NSA are tapping searx wholely
| or just instances. What exactly are the ramifications? I feel
| like they are going to be largely missing the target. A bunch
| of techsavvy nerds trying different search engines aren't going
| to be terrorists.
|
| And even if they are? As a Canadian or someone who isnt in the
| USA. What exactly is the point? Wouldn't this effectively be
| the safest host? CIA/NSA wont be selling your private infos.
| They wont be sending me to a blacksite because i look at python
| documentation and youtube chill music.
| kwhitefoot wrote:
| > CIA/NSA wont be selling your private infos.
|
| Why not? They used to sell cocaine after all and your info is
| probably rather less risky.
| sleepysysadmin wrote:
| >Why not? They used to sell cocaine after all and your info
| is probably rather less risky.
|
| It will reveal the operation busting any potential for
| catching terrorists.
| sweetbitter wrote:
| The purpose of government SIGINT (Signals Intelligence)
| is certainly not to catch terrorists/pedophiles/money-
| launderers. Those activities are generally
| tolerated/endorsed by intelligence agencies, as they are
| not heinous enough to garner their ire, even helping them
| whenever they coerce someone into committing a terrorist
| attack. The true purpose of all of those data is to
| create a metadata map and to assess who is up to what and
| who can do what, such that the powers of their nations
| over the world can be maintained as long as possible.
| [deleted]
| phantom_oracle wrote:
| I should have added that my comment implied as much to do with
| DDG as it does with cheap-VPN-provider-35 with a shell company
| in Belize.
|
| The original comment was in reference to DDG proudly making
| claims of not getting requests from .gov and marketing
| themselves as a company who "cannot see what you search for".
| tandav wrote:
| Just searched a couple of queries like "opencv rectangle",
| "python regex" - and it returned nothing
| unixfox wrote:
| Which instance did you use?
| ColinHayhurst wrote:
| https://www.searchenginemap.com/
| account-5 wrote:
| That site is horrible on mobile, a good portion of the screen
| is taken up by the orange "download/view" infographic thing.
| Interesting though to see how connected the engines are, I
| would have thought DDG would be bigger with its bang option
| though I assume it's about what is natively included in the
| results.
| ColinHayhurst wrote:
| Yes, it is horrible on mobile. The size of all the
| syndicating search services are the same. An update is
| overdue.
|
| Complementary twitters list are maintained here:
| https://twitter.com/SearchEngineMap/lists
| freediver wrote:
| I believe rightdao.com is missing from the list. It has
| independent index (and also impressive speed).
|
| Also not sure what the criteria for inclusion is, but
| search.marginalia.nu and teclis.com both have their own
| indexes.
| agluszak wrote:
| Great! I wish there was a possibility to blocklist certain
| domains (who wants to see Quora in their results...). This should
| be easily implementable on Searx's side. Another feature I often
| wish for is searching in a specific time period. It's so
| annoying, for example on Youtube, when I remember that a video
| was released in 2011, but there's simply no filter for it.
| dalf wrote:
| > I wish there was a possibility to blocklist certain domains
|
| You can do that in this fork:
| https://github.com/searxng/searxng/blob/e839910f4c4b085463f1...
| herodotus wrote:
| Could this be installed on a Raspberry Pi? I am very happy with
| my Raspberry Pi-hole: would not mind adding a second pi for
| searching.
| skerit wrote:
| Alright, going to try searx.be for a while then.
| jqpabc123 wrote:
| I heard it could be a honeypot for the CIA. But feel free to
| prove me wrong.
| unixfox wrote:
| If you don't want to use a single searx instance then feel
| free to use a random one automatically for each search thanks
| to this tool which can be used locally:
| https://searx.neocities.org/
| jqpabc123 wrote:
| I heard these searx instances could be linked together in a
| honeypot network run by the CIA.
|
| But feel free to prove me wrong.
| schleck8 wrote:
| Metager is a non-profit, open-source search engine running fully
| on renewable energy. It also has a proxy for opening results
| anonymously
|
| https://metager.de
| czechdeveloper wrote:
| That seems quite exclusive to germany
| hermitsings wrote:
| https://metager.org/ for english users
| abetusk wrote:
| Search engines have been coming up lately, so maybe this is a
| good a place as any to discuss some back of the envelope
| calculations.
|
| Let's say we wanted to recreate the web index made by Google. How
| much cost and engineering would it take?
|
| Estimating the size of the web from worldwidewebsize.com [0],
| this is estimated at around 50 billion (50 _10^9). The average
| web page size looks to be on the order of 1.5 Mb (1.5_ 10^6). The
| nominal cost of hard disk space is about $0.02 / Gb [2].
|
| So, roughly, that's 75 exabytes of data (~75 _10^15). At a cost
| of $0.02 / Gb that gives roughly $1.5M just to buy the hardware
| to store (a significant fraction of?) the web. The Hutter prize
| exists [3], so maybe there's some confidence that we only need to
| actually store 1/10 of that, so around $150k in costs.
|
| For perspective, that's 10 multi millionaire silicon valley types
| donating about $150k each, 100 "engineer types" at $15k each or
| 1000 to 10,000 pro-active citizens at $1.5k to $150 each (_just*
| for the hard disk space, discounting energy, bandwidth and other
| operating costs).
|
| If we try to extrapolate lowering hard disk space costs and take
| the price halving to be about 2.5 years with a current
| (pessimistic?) cost of $0.02/Gb, that's about 10-15 years before
| a petabyte scale hard drive is available to the consumer for
| $1000.
|
| From my perspective, I would ask "why hasn't a decentralized
| search index been created and/or is in wide use?". My guess is
| that figuring out a robust enough system that's cheap enough is
| still out of reach. $150 might not seem like a lot, but you have
| to convince 10k people to devote energy just to search.
|
| Put another way, when does the landscape change enough so that
| decentralized search is a viable option? My guess is that when
| people can store a significant fraction of the web locally for
| nominal cost is the determining factor. Maybe some great
| compression and/or AI sentiment analysis can be done to bootstrap
| and maybe some type of financial incentives can help solve this
| issue, but my bet these will only provide a light push in the
| right direction and the needed technology is the underlying cheap
| disk space.
|
| As a side note, the worldwidewebsize.com [0] shows the number of
| indexed pages by Google holding pretty constant over a five year
| period with a sharp decline somewhere in 2020. I wonder if this
| is the method of estimation or if Google has changed something
| significant in their back end to alter their search engine and
| storage.
|
| [0] https://www.worldwidewebsize.com/
|
| [1] https://www.pingdom.com/blog/webpages-are-getting-larger-
| eve....
|
| [2] https://www.backblaze.com/blog/hard-drive-cost-per-gigabyte/
|
| [3] http://prize.hutter1.net/
| kristianpaul wrote:
| Doesnt searx looks for results at duckduckgo and google for you
| anyway? Whats the difference from using ddg directly?
| sebow wrote:
| I don't think it uses DDG directly.But anyways you can
| configure the sources for files, media, wiki,etc. Makes sense
| since the engine is open source, but then again it's not really
| a search engine itself but a metasearch one
| BeetleB wrote:
| Searx can ping multiple search engines, including those not
| supported by DDG. For example, searx has a dedicated file
| search, which includes torrents.
| boudin wrote:
| There's other sources available. It is a meta search engine, so
| it will always rely on other sources, but you can disable
| duckduck go and google backends.
| luciusdomitius wrote:
| Isn't duckduckgo just an alternative frontend for bing with an
| integrated ad/tracking blocker. Or at least that's what they
| claim.
| nicce wrote:
| It is. They say they add some indexing on they own, but
| results are all the same.
___________________________________________________________________
(page generated 2021-11-12 23:00 UTC)