[HN Gopher] Marginalia: 3 Years
       ___________________________________________________________________
        
       Marginalia: 3 Years
        
       Author : latexr
       Score  : 227 points
       Date   : 2024-02-25 14:25 UTC (8 hours ago)
        
 (HTM) web link (www.marginalia.nu)
 (TXT) w3m dump (www.marginalia.nu)
        
       | marginalia_nu wrote:
       | Throwback that gives some indication of how both well and at the
       | same time questionably it worked a mere 6 months in:
       | https://news.ycombinator.com/item?id=28550764
       | 
       | Though I think now there's a bit too much reddit and
       | stackexchange and wikipedia stuff in the default filter.
        
       | RandomWorker wrote:
       | I've got this bookmarked and use it to find hyper niche materials
       | on numerical modelling. The stuff it finds on solvers, mesh
       | generation, and optimization methods is so much better than
       | anything I could ever find on Google. Stuff from the 80s and 90s.
       | I've found sites written by professionals that I would never find
       | on Google. As someone that doesn't just take the commercial
       | package of the shelf getting to that knowledge maybe finding
       | Fortran code examples is extremely valuable.
        
         | perihelions wrote:
         | Do you have an example of a niche expert page you found useful,
         | which is easier to find on Marginalia than Google?
        
           | RandomWorker wrote:
           | Search "numerical solver" in any of the major search engines
           | what do you see?
           | 
           | Now do that on Marginalia; you find https://www.scilab.org
           | within the first 10 results, which is an open-source
           | numerical solver software, which get's me to code, which gets
           | me to examples to use.
           | 
           | To be nuanced, I could change my search to "open-source
           | Range-Kutta numerical solver examples" or something better,
           | but why? Give me the weird deeply technical stuff first.
           | 
           | Maybe more a HN example; when I wanted to learn about load
           | balancers, just search "load balancers."
           | 
           | Google; lots of SEO crap (with soooo many ads), youtube
           | videos?, and AWS commercials at the top. No idea, where I'm
           | going.
           | 
           | Marginalia; a linux wiki, nginx (official documentation), a
           | couple blogs by professionals on the topic. yeah, there are
           | some things in here that ain't great.
           | 
           | But If I compare apples to apples the first ten results are
           | just so much better I'd say 8/10 in Marginalia for this
           | example are great to good, while the first 10 (things I could
           | click on) are by companies that don't teach me anything or
           | have articles full of ads.
        
             | dewey wrote:
             | I don't think your examples are very good, these are very
             | generic search terms and if the results are good or bad
             | very much depends on the person searching and what they are
             | looking for.
        
               | marginalia_nu wrote:
               | Underspecified queries ("generic search terms") is
               | actually one of the tricky problems in search. The way
               | Marginalia Search deals with them caters to a particular
               | type of audience and set of usecases. I don't think
               | that's wrong. Seems silly to try to cater to everyone. In
               | that scenario you're more likely to not cater to anyone.
        
               | subtra3t wrote:
               | Ah, this helped clarify my doubts. Thanks.
        
               | eviks wrote:
               | Your "domain" filters seems like a good solution to this?
        
               | marginalia_nu wrote:
               | Yeah. Some of them are a bit rough still, but that's the
               | general idea. Instead of trying to guess what sort of
               | content the user wants, it seems like it makes sense to
               | just give them the option to express that.
               | 
               | The recipe filter is approaching something I'd want to
               | explore further, to be able to provide contextual
               | information outside of the search query.
               | 
               | https://search.marginalia.nu/search?query=cookie+recipe
        
               | ignoramous wrote:
               | > _particular type of audience and set of usecases_
               | 
               | For signed in accounts (which is pretty much is ~3bn
               | Android and/or Chrome users), Google can predict what the
               | user might prefer and yet...
        
               | marginalia_nu wrote:
               | I disagree with this feature, besides the predictable
               | privacy argument, having a search engine transparently
               | serve results according to your tastes makes it really
               | difficult to find things that are new and outside of your
               | existing preferences. It drains the web of serendipity,
               | makes every website feel the same.
        
               | 48864w6ui wrote:
               | Exactly: marginalia results are good for geeks; I search
               | for me, not for J Random Consumer.
        
               | Kamq wrote:
               | They're actually perfect examples for this thread.
               | 
               | We've already constrained "what we're looking for" to be
               | "niche expert pages" further up thread. If we're seeing
               | niche expert pages even for generic search results,
               | that's probably a good indication that the search engine
               | behaves the way RandomWorker is describing
        
             | kstrauser wrote:
             | It's result 10 on Kagi.
        
             | eviks wrote:
             | The quoted phrase finds nothing, unquoted also no scilab,
             | then realized I've made a typo and it's numerical, not
             | numeric, then I get it
             | 
             | Google was ~ top 60, which for such a generic term seems
             | fine, not much scrolling down
        
             | bluish29 wrote:
             | kagi will give me scilab as the result #13 and this
             | probably because I raise arxiv and stackoverflow results
             | which will get high.
        
       | unpopularopp wrote:
       | Tried my last 3 Google searches
       | 
       | india test cricket lowest total > None of the results are good or
       | giving an answer
       | 
       | raid calculator > The results are OK but you still have random
       | noise like a Pokemon save/cheat editor page because it contains
       | the word raid
       | 
       | all quiet on the western front movie book differences > 0
       | results. Like straight up no hits, an empty page
        
         | marginalia_nu wrote:
         | > india test cricket lowest total > None of the results are
         | good or giving an answer, straight up wrong sites.
         | 
         | The search engine has no ambitions to provide a knowledge graph
         | at this point. It's for finding documents on the internet,
         | rather than answering questions. Answering questions is a
         | definitely something one might want, but it often comes at the
         | expense of finding documents.
         | 
         | > raid calculator > The results are OK but you still have
         | random noise like a Pokemon save/cheat editor page?
         | 
         | The pokemon result was discussing an application called
         | "raidcalc". Seems like a good match, given the search engine
         | does not profile you at all and has no clue about what your
         | interests are.
         | 
         | > all quiet on the western front movie book differences
         | 
         | Hmm, I think there's an upper bound on the query length you
         | hit. Could probably remove this, it's a pretty old, an artifact
         | from when the query execution didn't deal with long queries
         | well.
         | 
         | --edit--
         | 
         | Hmm, I increased the limit but they're still kinda not very
         | good. Although this is definitely squarely within the realm of
         | what I'm working on next, which is query understanding and
         | execution.
         | 
         | Right now the search engine doesn't really know how group the
         | terms. Like a human being can see that you'd want
         | 
         | |all quiet on the western front| in a sequence, preferrably in
         | the title or appearing a few times, and 'movie', 'book', and
         | 'differences' should be important to the document, but not
         | necessarily appear in that exact order.
         | 
         | The search engine currently looks for either documents where
         | they all appear in proximity, or all individual words have high
         | tf-idf relevance markers. Not great for this query.
        
           | klabb3 wrote:
           | Is it possible to just quote the title of the book, old-
           | school style, so it becomes a single phrase?
           | 
           | It is arguably a better UI than handing a barrage of words
           | and hoping the engine does the sense-making.
        
             | marginalia_nu wrote:
             | Not yet, the support for long quoted sentences is a bit
             | sketchy. Also within the wheelhouse of what's up next
             | though. Having solid support for manual grouping is pretty
             | much a prerequisite for automatic grouping anyway.
        
           | mdaniel wrote:
           | I hope this tone comes across correctly as just a suggestion:
           | I get _a lot_ of mileage out of the  "Send Feedback" option
           | in DDG, which they claim actual humans do read. It can help
           | move bug reports out of these HN threads into a more context-
           | aware flow, and also makes me feel like any bad outcome has
           | the possibility of improving, unlike systems that don't
           | provide a "I feel bad about this experience" button
           | 
           | If you were thus inclined,
           | https://gitlab.com/glitchtip/glitchtip#glitchtip is the
           | actual open source Sentry implementation which (as far as I
           | know) would enable gluing
           | https://docs.sentry.io/platforms/javascript/user-
           | feedback/#u... to the search results page (that client-side
           | library is still MIT: https://github.com/getsentry/sentry-
           | javascript/blob/7.102.1/... )
        
       | doubloon wrote:
       | yup this is definitely alot like how it used to be.
       | 
       | @unpopularop cant find "all quiet on the western front book movie
       | differences". well you couldn't do that with AltaVista either in
       | 1998.
       | 
       | however if you just type "all quiet on the western front" you get
       | a ton of niche obscure sites talking about it. literally
       | someone's personal blog page.
       | 
       | type in 'polytopes' you get a bunch of universities papers and
       | code sites.
       | 
       | "rust generics" - again, its a bunch of mailing list discussions,
       | blogs, rust discussion groups, personal websites, obscure
       | professional discussions.
       | 
       | this IS how it was back in the day.
       | 
       | my only question is how could this possibly be sustainable
       | financially in the long run.
        
         | marginalia_nu wrote:
         | > my only question is how could this possibly be sustainable
         | financially in the long run.
         | 
         | For now I'm funded by grants and donations, got a few years
         | runway that way.
         | 
         | The actual operational cost is like $100/month for colocation +
         | personal expenses so what money comes in lasts a surprisingly
         | long time. In the future, we'll see. There does seem to be a
         | lot of people that want this type of thing to exist though, so
         | the hope is if I polish it even more, further funding will
         | become available from likeminded people, possibly selling API
         | access to other search engines.
         | 
         | Search is notoriously hard to make money from (outside of ads),
         | though not having a lot of expenses seems like a reasonable
         | path to go.
        
           | 48864w6ui wrote:
           | It sounds like you only need one person (not as deep pocketed
           | as Andrew Carnegie but who has read "gospel of wealth" and
           | agrees with it) to have support for decades if not
           | perpetuity.
           | 
           | Universities traditionally have done this sort of thing by
           | playing golf and naming buildings, but I'm sure in the 21st
           | century there are other models. (Fwiw $2k/yr is below a
           | typical golf membership)
        
           | gary_0 wrote:
           | I think as long as you're not setting out to start a tech
           | company with thousands of employees, or branch out into a
           | sector with the word "cloud" in it, you'll be fine. Only
           | unreasonably big ambitions cost billions.
           | 
           | A project is usually on the road to success when it starts
           | with a disclaimer like "just a hobby, won't be big and
           | professional like gnu".
           | 
           | I think a larger concern is how you'll address the Bus Factor
           | going forward.
        
             | mdaniel wrote:
             | > I think a larger concern is how you'll address the Bus
             | Factor going forward
             | 
             | I can't speak to how much energy it is to go from code to
             | serving requests, but FWIW the code is AGPLv3 and seems to
             | be updated regularly https://github.com/MarginaliaSearch/Ma
             | rginaliaSearch/blob/v2...
        
               | marginalia_nu wrote:
               | I recently put some effort into making it possible to run
               | and host the system fairly easily[1]. That said, serving
               | basic search data and operating a search engine is two
               | different things. To do more than index a couple of blogs
               | you inevitably need a fairly deep understanding of the
               | system, probably decent hardware, and so on.
               | 
               | But the long term goal is that this is something that's
               | relatively easy to operate and extend.
               | 
               | [1] https://www.youtube.com/watch?v=PNwMkenQQ24 (quick
               | install and demo)
        
       | ttt3ts wrote:
       | Cool engine. Going to check out source soon but "ROME2D16-2T"
       | returned relevant results from esoteric sources. Useful.
        
       | NeutralForest wrote:
       | Congrats on the progress, I don't use marginalia as much as I
       | should because I'm so used to rely on Google. It's a wonderful
       | project though and I'll prob use it more since spammy SEO sites
       | and AI generated answers seem to get more prevalent.
        
         | marginalia_nu wrote:
         | Probably some ways away from daily driver material.
         | Optimistically sometime this summer when I'm done with the
         | query and execution stuff it'll start approaching that
         | territory.
        
       | aqfamnzc wrote:
       | I've been impressed by the results I see there. And you've chosen
       | a sick name for it.
        
       | behnamoh wrote:
       | I just looked up "transformers intuition" and the results blew my
       | mind. In comparison, Google's results led me to SEO'd websites
       | (mostly Medium) and fancy-looking sites with inferior content.
       | Awesome work Marginalia!
        
       | 101008 wrote:
       | did a search and first results were from "stack exchange sci fi",
       | i was expecting something more nostalgic
        
         | marginalia_nu wrote:
         | Try the vintage or tilde filter.
         | 
         | https://search.marginalia.nu/search?query=anime&profile=vint...
         | 
         | https://search.marginalia.nu/search?query=anime&profile=tild...
        
       | renegat0x0 wrote:
       | Most important lines for me.
       | 
       | It's proving a bit harder than anticipated, not because the
       | software can't handle it, but because the signal to noise ratio
       | of the web isn't very good; a huge reason why the search engine
       | works relatively well is because of what it doesn't index.
        
       | InvOfSmallC wrote:
       | Do you offer API?
        
         | marginalia_nu wrote:
         | https://api.marginalia.nu/ :-)
         | 
         | Demo key is always under siege though.
        
           | InvOfSmallC wrote:
           | Thanks, I'm building a website focused on Metroidvanias. I
           | liked the results so I was thinking I may use it to offer
           | some interesting results on the various game pages.
        
       | dreamcompiler wrote:
       | Viktor- I'm curious as to whether Common Crawl [0] would be
       | useful to you. It's currently around 100TB and 3.35 billion
       | pages, so it's going to be a long download unless you process it
       | in place on S3. I have no idea what its signal/noise ratio is.
       | 
       | [0] https://commoncrawl.org/overview
        
       ___________________________________________________________________
       (page generated 2024-02-25 23:01 UTC)