[HN Gopher] Why is Confluence Wiki Search so bad?
       ___________________________________________________________________
        
       Why is Confluence Wiki Search so bad?
        
       The title says it all. To me, the most important component of a
       wiki is search. With that said, why is confluence wiki search
       basically unusable?  (by unusable, I mean I can never find the page
       I am looking for when I search. Basically, I have to maintain my
       own wiki of important links I may need to reference in the future)
        
       Author : nicktorba
       Score  : 128 points
       Date   : 2021-09-20 19:39 UTC (3 hours ago)
        
       | jacquesm wrote:
       | Try gmail. More than a decade on and _still_ no partial word
       | match.
        
         | lelandfe wrote:
         | And from a search company, no less.
        
         | jfrunyon wrote:
         | I usually have better luck with the "autocomplete" results in
         | gmail search than with the actual search results. I don't even
         | know how you manage to screw up your core competency that
         | badly.
        
         | jstummbillig wrote:
         | Gmail produces by far the best search results for me (comparing
         | to apple mail and thunderbird) and makes me reach for it
         | regularly for search alone, which I find pretty annoying. If
         | there is anything better out there I am all ears.
        
           | jsjohnst wrote:
           | > If there is anything better out there I am all ears.
           | 
           | Mutt, Pine, grep, awk, etc. I don't understand why throwing a
           | GUI interface on top automatically seems to make email search
           | absolutely awful, this includes Gmail. I so often need to
           | find a specific old email using a hazy match criteria that I
           | am half tempted to pipe my email into Splunk (I run a small
           | Splunk cluster at home for other needs) and use it (as then I
           | don't need a local copy of every email on all devices or to
           | need to SSH into a central box to do a TUI based search)
        
       | modeless wrote:
       | Because "enterprise" tools are bought by people who don't have to
       | use them, so improvements that actually matter to users are not a
       | priority.
        
       | itomato wrote:
       | Lucene
        
       | walrus01 wrote:
       | I don't understand why people use confluence.
       | 
       | I can gain far more functionality with a properly implemented
       | self-hosted mediawiki server (the same code that runs wikipedia
       | itself) with a number of useful plugins installed and enabled.
       | 
       | It doesn't require a rocket science level of apache2+php7+mariadb
       | knowledge to set up. The instructions are really quite
       | straightforward.
        
         | ivan_gammel wrote:
         | In corporate environment paying for Confluence Cloud
         | subscription can be cheaper than having even a part time admin
         | to install and maintain self-hosted solution (proper security,
         | backups, handling compatibility issues on updates etc etc). It
         | may not be the best solution, but it is good enough.
        
           | jfrunyon wrote:
           | In corporate environment how do you not already have an admin
           | who can handle this just like they handle any of your other
           | self-hosted needs? I've never worked for a single company
           | that didn't have _something_ hosted internally.
        
         | MegaDeKay wrote:
         | We started with a self-hosted mediawiki server and this did not
         | go well. Expecting someone not very computer savvy (and there
         | are lots of those in my company) to dive into the markup on a
         | page and not make a mess of it was a bad idea. At that time at
         | least the WSIWYG editor was not very usable. Don't know if that
         | is still the case.
         | 
         | So off we went to Atlassian. It has many flaws, but nobody is
         | pining for the old days of Mediawiki. And the hooks Confluence
         | has in to Jira is something you don't get with plain Mediawiki,
         | and that has real use for us.
        
           | jfrunyon wrote:
           | You can literally go see for yourself how the WYSIWYG editor
           | works these days. I suspect it's come a long way since the
           | last time you checked it.
           | 
           | My bigger question though is why the average user is
           | important. Most large companies have employees whose entire
           | job is ... knowledge management. If they can't figure out how
           | to write wikitext then maybe they're not a good fit for the
           | role?
        
         | rablackburn wrote:
         | If this is a serious question, this is why:
         | 
         | Confluence users are enterprise companies, and getting a self-
         | hosted server up and running is too much pain to be bothered to
         | deal with.
         | 
         | This is a process problem. The steps to get one would be
         | something like:
         | 
         | - try and find the "provision a server" option in the corporate
         | service portal (there probably isn't one)
         | 
         | - ask someone if they know how to provision one. Get a link to
         | a separate system where you can make the request
         | 
         | - you need to associate the instance with a cost centre, or
         | maybe you literally need a credit card number, don't forget to
         | attach written manager approval
         | 
         | - update the project's budget to include the unexpected cost of
         | this internal service. Hopefully there's actually some margin
         | to afford it.
         | 
         | - wait a day or two for the request to go through
         | 
         | - get the instance details, RDP in and try and set everything
         | up. Realise you need to make a separate request for admin
         | rights to install non-base software if you don't want to use
         | IIS and MSSQL server
         | 
         | - wait a day for admin rights. Don't forget to add written
         | manager approval to the request or else it will be denied
         | 
         | - realise you need to make a separate DNS request to get a
         | friendly url for the team to access it. Also, how are you going
         | to secure access to just your team members? Need to integrate
         | with the corporate AD
         | 
         | - ...about a dozen more steps
         | 
         | Compare all of that with:
         | 
         | - Go to the corporate confluence instance
         | 
         | - click "Create", add your team members with edit rights.
         | 
         | - done
         | 
         | Confluence itself may not be a great experience to use, but
         | it's solving the problem of getting to the point of having a
         | wiki setup in the first place.
        
           | jfrunyon wrote:
           | > getting a self-hosted server up and running is too much
           | pain
           | 
           | And yet many of them self-host Confluence. And many other
           | things. And provision servers all the time. And you have to
           | provide a CC (or maybe PO) for Confluence in any case. And
           | you _can 't_ just associate Confluence with a cost centre.
           | And you have to budget it. And... literally every single one
           | of your arguments applies just as much to Confuence.
        
           | GordonS wrote:
           | And firewall rules!
        
           | GordonS wrote:
           | Oh, and updating the CMDB too!
        
       | sharva wrote:
       | Yes both Jira and Confluence search are frustrating at times.
       | This is one of the big wins of using Glean (https://glean.com)
       | for me as a developer :-)
        
       | abeppu wrote:
       | I'll take a stab at actually guessing why aside from the issue
       | that people making purchasing decisions don't see how bad it is
       | until work has already gone into bringing in docs and pushing
       | people to use it.
       | 
       | Aside from the organizational issues, I think there's a problem
       | where basically no search system can be good for every org with
       | any kind of internal info and different queries from perhaps
       | several distinct types of users with different goals. To get
       | good, a system needs to improve through at least rudimentary ML.
       | At its simplest, if Alice searches for X today and clicks doc3,
       | if Bob searches for X tomorrow, doc3 should rank higher. This
       | requires collecting and aggregating click stream data, and using
       | this count info (with cardinality #docs x #queries) at search
       | time. But sometimes it requires a richer model relating search
       | terms to terms in relevant (clicked) docs and optimizing for some
       | measure of search quality (NDCG) etc. All of this requires
       | detailed access to docs, search/click histories, and a fair
       | amount of computation and storage. But customers have legit
       | reasons for wanting these docs to only be accessible by their own
       | employees. And they don't want to dedicate their own staff to
       | improving such a system. No one wants to hear that their model
       | retaining ran out of memory, etc. So shipping a simple system
       | which doesn't improve but doesn't have moving parts becomes a
       | local optima.
        
       | VWWHFSfQ wrote:
       | I'm my experience almost everything that Atlassian makes is total
       | garbage. Bitbucket, Jira, Confluence, etc. are all horribly slow
       | to the point of being unusable and most of it has very poor
       | UI/UX. I pretty much don't recommend anything they make. It's not
       | surprising at all that a fundamental feature of a wiki, _search_
       | , doesn't work very well.
        
         | noja wrote:
         | That's what everyone has said about every piece of enterprise
         | software ever.
        
           | walrus01 wrote:
           | wait until you see _medical_ enterprise software or defense
           | industry enterprise software
        
           | acdha wrote:
           | It happens any time the buyer isn't the user. Atlassian
           | products are terrible because some manager buys them and
           | tells everyone they have to use it, and if the engineers
           | complain they'll probably just blow it off as "they're too
           | demanding" or "they don't want to do Agile right".
        
           | kube-system wrote:
           | It's the incentives that are in place. Most enterprises buy
           | products based on feature sets. Therefore, enterprise
           | software companies prioritize delivering features.
        
           | m463 wrote:
           | I remember what a friend said about software.
           | 
           | The desktop people want the latest and greatest software ASAP
           | if not sooner.
           | 
           | The server people want nothing to change, ever.
           | 
           | I'm sure enterprise software has similar rules and
           | incentives.
        
         | chrisseaton wrote:
         | Atlassian products feel like raw database frontends. I feel
         | like each screen in each Atlassian product is always exactly a
         | database table, being presented to me as an auto-generated
         | form. Might as well use SQL directly.
        
           | mdoms wrote:
           | That really couldn't be further from the truth, especially in
           | Jira. Jira keeps virtually every piece of interesting
           | information in a custom field, including built-in fields like
           | issue titles and points (known as system fields but
           | effectively the same thing). Every view you see is the
           | product of a zillion complicated joins across field
           | definitions, field schemes, field values, field permissions
           | and other bits and pieces.
        
           | [deleted]
        
           | sophacles wrote:
           | Why use a single query languange when one for each view is
           | possible?
           | 
           | - Atlassian probably
        
           | 0xffff2 wrote:
           | The truly impressive feat (of Jira in particular, but also
           | all of Atlassian's products in general) is how incredibly
           | slow they are. I assume each page somehow touches every
           | single row of every single table in the database because I
           | don't know what else it could be doing to make page loads
           | take so long.
        
             | ironmagma wrote:
             | It's artificially slow to get you to upgrade. Wish I was
             | joking. Thankfully my company uses Clubhouse/Shortcut which
             | is orders of magnitude better.
        
         | hotpxl wrote:
         | So true. The way Atlassian hijacks browser keyboard hotkeys in
         | Jira/Confluence/Bitbucket is purely infuriating.
        
         | PaulHoule wrote:
         | I have seen self-hosted Jira installs that took 20+ sec to load
         | a page.
         | 
         | Today I use one that they host and there is nothing wrong with
         | it.
        
           | nicoburns wrote:
           | We used the hosted version. It would lag on the order of
           | seconds while trying to type in the issue description box. We
           | switched to another issue management software.
        
           | jfrunyon wrote:
           | I tried their hosted version for a bit on their 30-day trial
           | or whatever.
           | 
           | Virtually every page load took upwards of 5 seconds.
        
         | CodeAndCuffs wrote:
         | IMO bitbucket is okay. Its UX for PRs is amazing, 1000x better
         | than Githubs. Especially its side by side diff.
         | 
         | This concludes, and fully encompasses, everything good that I
         | have to say about Atlassian products.
        
           | cosmotic wrote:
           | We use bitbucket cloud, and the PR UX is awful. Which version
           | are you using? Are you using a browser extension or
           | something? Compared to UpSource or GitHub, Bitbucket PRs are
           | very rough.
        
             | globular-toast wrote:
             | Bitbucket Cloud and Bitbucket On-prem are two entirely
             | separate products. It makes about as much sense as you can
             | expect from Atlassian. The former was a Mercurial thing
             | that they purchased then later removed Mercurial support.
             | The latter used to be called Stash.
             | 
             | We moved from Bitbucket On-prem to Gitlab and I must admit
             | I do miss parts of Bitbucket's UI. It was much easier to
             | find reviews you needed to do and it was much clearer when
             | reviewers had finished reviewing and if work needed to be
             | done. Gitlab should just copy this stuff.
        
               | jschumacher wrote:
               | I was the head of product for the developer tools at
               | Atlassian in 2012. We thought long and hard about taking
               | Bitbucket cloud and packaging it in a VM (which is what
               | GitHub did at the time) or leveraging the platforms we've
               | already built for Confluence and Jira that would give us
               | access control and a plug-in system from day 1. It was a
               | tough call.
               | 
               | Ultimately we've decided to build on top of our server
               | platforms and target companies with 1000+ employees from
               | day one. That decision had a huge impact on how we
               | approached performance and what features we prioritised.
               | The hierarchy of projects and permissions associated with
               | them as well as the way we designed Pull Requests are
               | good examples of that.
               | 
               | It was the right decision at the time, even if the
               | product happened to be different in cloud and server,
               | which did lead to some confusion. But Stash customers
               | were really happy with the product.
        
           | sam_lowry_ wrote:
           | Try to use Intellij's Github plugin. It does wonders.
        
           | rdw wrote:
           | Atlassian bought Bitbucket after it was already mature.
           | That's why!
        
             | jschumacher wrote:
             | Not quite. Bitbucket was acquired in 2011, only supported
             | Mercurial and was missing a lot of features, including the
             | pull request available today.
        
             | jschumacher wrote:
             | Bitbucket Server, which some people are referring to here,
             | was build from the ground up, tailored to a self hosting
             | environment.
        
         | omgtehlion wrote:
         | Well, they bought Trello and ruined it too :(
        
           | FreezerburnV wrote:
           | What's wrong with Trello? It still seems to run fast? And has
           | some new stuff added that seems to be useful? Dunno, still
           | seems to be fine to me.
        
         | marcodiego wrote:
         | They are garbage for developers but managers love it. Guess who
         | decides in the end?
        
         | kvathupo wrote:
         | +1, Bitbucket search often returns results from older versions
         | of a repo. Wouldn't be an issue if syncing to the current
         | master didn't take a few days...
        
       | abridgett wrote:
       | It didn't really seem to have any prioritisation - e.g. around
       | titles, headings or any metadata (view count, edits, last
       | updates). Agree completely it was awful.
       | 
       | OTOH I'm also a believer that you should be able to navigate to
       | the right information.
       | 
       | People seem to think that writing pages is sufficient. A library
       | works because pages are gathered in books, organised by sections
       | and has an army of librarians to keep it running smoothly.
       | 
       | I treat documentation like code - DRY, refactor apply just the
       | same. e.g. I might split a page up so that some common part can
       | be re-used. I'll cull obsolete information or mark it obsolete.
       | I'll _also_ updated headings to help them show up in searches.
        
       | Kalanos wrote:
       | Confluence search is great! I could always find what I needed. In
       | fact it's my favorite feature about Confluence. I'd say it's my
       | favorite search outside of Google.
        
       | mdoms wrote:
       | I used to work at Atlassian but NOT on Confluence and I have no
       | special information about this. But I can tell you that
       | internally it is well known how awful the search is - they run
       | one of the biggest known instances of Confluence - and there have
       | been many spikes and projects to improve it. I have spoken to
       | lots of people and asked why it continues to be so bad but all I
       | get is handy-waving about how it's such a hard problem.
       | 
       | Honestly I wish I knew more but it was like pulling teeth trying
       | to get people there to speak openly about why it's so hard when
       | it is solved in so many other products.
        
       | hyperation wrote:
       | Same experience for me. However, I started to be more diligent on
       | tagging each Confluence page whenever I see them lacking and that
       | definitely helps with the searches.
        
       | leetrout wrote:
       | So I am interested in this space. There are some alternatives out
       | there but I suspect companies will be concerned with letting a
       | 3rd party have access to the data needed. If you are interested
       | in this space and would be willing to chat with me about what
       | you're looking for OR what you are currently using I'd love to
       | chat! My email is my username at gmail.com
       | 
       | Some existing tooling:
       | 
       | Google cloud search has a confluence connector
       | https://developers.google.com/cloud-search/docs/connector-di...
       | 
       | Elastic workplace search has a connector.
       | https://www.elastic.co/guide/en/workplace-search/current/wor...
       | 
       | Lessonly had / had a thing called Obie
       | https://www.lessonly.com/blog/how-to-search-better-in-conflu...
       | 
       | Raytion https://www.raytion.com/connectors/raytion-confluence-
       | connec...
        
         | svcrunch wrote:
         | Hi there. One of the first customers we had (zir-ai.com) asked
         | for help building a better JIRA search.
         | 
         | I think neural-network powered search will be the long-term
         | solution for Wiki search specifically, and SaaS search more
         | generally.
         | 
         | Keyword has too many failure cases, and works poorly when
         | there's not a lot of data, or when searching through content
         | authored by others.
         | 
         | I'll contact you offline. Would love to hear more about your
         | experience in this area.
        
       | CPLX wrote:
       | This thread is well timed, I was just about to pick a wiki
       | solution and was leaning towards confluence. But search is really
       | important to me.
       | 
       | What's the prevailing wisdom these days on the best solution for
       | an internal knowledge base/wiki platform?
        
         | sethammons wrote:
         | As for an ok way to manage internal knowledge, I've yet to see
         | it. I've wanted to try out the Johnny Decimal System because if
         | you can create a solid hierarchy of a filing system, everyone
         | should be able to drill down to the right doc. Confluence
         | search doesn't work. Neither does google docs. I think I now
         | want the ability to just pull a local copy of a section of
         | docs, say "all engineering," and just use grep locally.
        
         | denysvitali wrote:
         | Markdown + Grep
        
         | polote wrote:
         | I'm working on one, V1 is going to be released in a few days
         | (you can find the link in my profile). It is meant to be a big
         | improvement to Confluence if your goal is to organize the
         | knowledge at company or department level. If you are a smaller
         | team, Notion is what I would recommend as long as you are
         | smaller than 100 people
        
       | boyter wrote:
       | I found the search pretty iffy at times. There was an exisiting
       | marketplace app for it that was not much better so I wrote my
       | own. Then turned it into a full marketplace app so others could
       | benefit.
       | 
       | It does partial matches anywhere in a word, supports every
       | language even in the same document, and even has regex support
       | for those who need it. Update instantly with instant filters.
       | 
       | It can find things like 168.0 in 192.168.0.1 which the existing
       | confluence search cannot for example. Or search for AKIA
       | credentials /AKIA[A-Z0-9]{16}/ I have heard people describe it as
       | Agolia for confluence which makes me happy.
       | 
       | https://marketplace.atlassian.com/apps/1225034/better-instan...
        
       | phone8675309 wrote:
       | ysk: You can save sites for reference later if you don't want to
       | create a page in Confluence to do it:
       | https://support.atlassian.com/confluence-cloud/docs/save-a-p...
       | 
       | If you want best of both words, you can use the "Favorite Pages
       | Macro" on any page to reference all of the pages that you have
       | saved for later, which makes keeping that page up to date with
       | your latest changes to saved pages trivial.
        
       | polote wrote:
       | Searching corporate wiki is pretty difficult, because contrary to
       | something like Google, you can't use context of a search query to
       | recommend content.
       | 
       | * First you have a few occurrence of the same search query in
       | your search history (because only a few people searched similar
       | words in the past)
       | 
       | * You can't either use synonyms of remove stop words to recommend
       | better content (IT, can means "information technology, or the
       | pronoun. THE can be an acronym, ...).
       | 
       | So basically the only thing you can do is search words.
       | Confluence is worse than that because it tries to remove stop
       | words and do things that break exact match search. But this is a
       | difficult job. Ways to improve search: allow multi titles, index
       | with tags, attributes, only do exact words match, allow users to
       | suggest content for a specific search query, search
       | autocompletion, searching in live during typing ... (many things
       | that Confluence doesn't care about). You also have to respect
       | rights when returning documents, each documents, can have rights
       | from folder or document itself, inherited from team access or
       | user access, so this is really computation intensive too, or pre-
       | compute rights
       | 
       | (Working on a competitor [0] of Confluence and I have put plenty
       | of hours of work on that specific issue, and I can tell you this
       | is really hard)
       | 
       | [0] https://dokkument.com
        
         | klyrs wrote:
         | Confluence _does_ search while typing, it 's just _so abysmally
         | slow_ that you typically won 't get a result until you've
         | stopped typing.
        
         | oconnore wrote:
         | It seems like there ought to be some recognition that these are
         | business tools, and ought to be designed with power users in
         | mind. Instead, "search" in B2B products is built with the same
         | uber-minimalist UX as B2C search.
         | 
         | Even early Google had more power user features than a typical
         | B2B product search bar.
         | 
         | Boolean expressions (NOT, OR, AND), exact match strings, links-
         | to, linked-from, in-folder/category, etc. should be mandatory
         | for these workflows. Better if you can include search queries
         | as live page content, as in Notion & Height.
        
           | polote wrote:
           | Knowledge management is still a neglected area in most of
           | companies. No money => a few players. Confluence has been
           | there for years with almost no competition. Notion has
           | emerged recently but is not really a good fit for medium to
           | large companies. As a result Confluence is not worried and
           | doesn't have to improve its product.
           | 
           | Power users are a small share of users of knowledge
           | management software, so it is difficult to build a system
           | only for them. Most people just type a few words and give up
           | if they don't find the result in the 5 first results.
        
             | dragosbulugean wrote:
             | We're also trying to build something in the space with
             | www.archbee.io, a YC company.
        
             | oconnore wrote:
             | > Power users are a small share of users of knowledge
             | management software, so it is difficult to build a system
             | only for them
             | 
             | In practice, knowledge management at companies is a
             | specialization. There are <5% of employees that go around
             | and document/organize things for everyone else. Most
             | employees are passively consuming information and
             | information hierarchies built by someone else.
             | 
             | If you're not building tools for those power users, you're
             | not building for creating and organizing content in your
             | system at all.
             | 
             | As an example of how nuts this is, managers at my company
             | regularly try out various search terms, create index
             | documents, and do "internal SEO" to optimize how other
             | employees will discover documents. This isn't a byzantine
             | environment like public web search is, why do I have to
             | hack around the wiki's default notion of page relevance?
        
               | polote wrote:
               | Well it depends of what you are talking about. Usually
               | people who produce contents are power users. But people
               | who search content as you said are the 95% of others
               | users, these are the ones who also needs a search
               | relevant to them.
               | 
               | My belief is that knowledge management can't exist
               | without power users, which we call "admins", these are
               | the ones responsible to make sure content is well
               | organized for others and create content if necessary.
               | Those people need specific tools to do their job well,
               | which to me is more something that you can have in an
               | admin interface while all the users use the basic
               | interface.
               | 
               | Those tools have two sets of users, admins (curators,
               | creators, organizers) and regular users. We need a
               | different interface for both. And that's exactly what we
               | are working on.
               | 
               | > This isn't a byzantine environment like public web
               | search is, why do I have to hack around the wiki's
               | default notion of page relevance?
               | 
               | That's exactly why I suggested to have multi titles, when
               | you get that and you facilitate the suggestion of new
               | titles for a document, anyone when finding a document can
               | suggest the query terms he used, and that can benefit
               | others users
        
       | dangoor wrote:
       | I agree. This is why I've tried to make use of Confluence's other
       | tools to make content findable and also improve search...
       | 
       | 1. give pages labels. This lets you insert a label-based index,
       | and also makes it possible to narrow search by label
       | 
       | 2. use spaces. Separate the content into spaces based on who is
       | likeliest to need that information. You can narrow search by
       | space, and put a search box on the page in the space.
       | 
       | 3. use the hierarchy. You have to put the pages somewhere in the
       | hierarchy anyway, so try to make it reasonable.
       | 
       | 4. Make useful index pages. Obviously, this doesn't scale, but if
       | you can provide people with useful starting points, it will help
       | them. For example, at Khan Academy we have a space for the whole
       | org with a front page to get you to every team's front page. The
       | engineering team has a front page with a small collection of
       | useful & commonly-used links
       | 
       | 5. if you have a page in your hierarchy with a lot of content
       | underneath it, add a search box on that page that constrains the
       | search to that set of pages.
       | 
       | The biggest problem Confluence search has is that it's terrible
       | with relevance, and using its tools to narrow down the search can
       | improve the relevance of the results considerably.
        
       | Krssst wrote:
       | In my understanding, you have to prefix all your keywords with
       | "+" for all of them to be necessary for a page to be included in
       | your results. This makes the behavior slightly closer to Google.
        
       | RegW wrote:
       | I'm amazed to see this here.
       | 
       | My colleagues and I have been grumbling for ages that our
       | instance of Confluence must be really badly configured. If you
       | put in a single word search term, there will be lots of results,
       | but no guarantee that any pages containing that word in the title
       | (or body), will appear above ones where it doesn't.
       | 
       | The search problem was solved long ago by Apache Solr/Lucene.
       | Although this may not be true for multiple languages.
        
       | EamonnMR wrote:
       | I crossed a huge milestone last week. I actually found something
       | I was looking for in confluence.
        
       | PaulHoule wrote:
       | Most search engines are pretty bad because the developers of most
       | search engines don't do any work to improve relevance.
       | 
       | This methodology works
       | 
       | https://ccc.inaoep.mx/~villasen/bib/AN%20OVERVIEW%20OF%20EVA...
       | 
       | and I used it to tune up the relevance of a search engine for
       | patents to the point where users could immediately perceive that
       | it worked better than other products.
       | 
       | After I worked on that I wound up talking to the developers
       | and/or marketing people for many enterprise search engines and
       | few of them, if any, did any kind of formal benchmarking of
       | relevance.
       | 
       | People at one firm told me that they used to go to TREC
       | conferences because they thought it got them visibility but that
       | they decided it didn't so they quit going.
       | 
       | A message I got repeatedly was that these firms thought that the
       | people who bought the search engines didn't care much about
       | relevance, but they did care about there being 200 or more plug-
       | ins to import data from various sources.
       | 
       | In principle the tuning is unique to the text corpus. One reason
       | for that is that there is a balancing act of having a search
       | engine that prefers small documents (they have spiky vectors that
       | look more like query vectors) or large documents (they have so
       | many words they match everything.) Different corpuses have
       | different distributions of document sizes, not to mention
       | different distributions of words that appear.
       | 
       | Few organizations are willing to do the work to tune up a search
       | engine (you have to decide about the relevance of 10,000+
       | document hits), but I've had the experience that you can beat the
       | pants off the defaults even using a generic tuning. For instance
       | that patent search engine was tuned up against the GOV2 corpus
       | instead of a patent corpus. A small patent corpus showed us we
       | were on the right track, however.
        
       | thedogeye wrote:
       | It's unbelievably bad. This is literally the only thing you need
       | a wiki for. I can't believe this is the market leader. Notion is
       | going to crush them.
        
       | simonw wrote:
       | The good news here is that the Confluence API is actually really
       | good, and very easy to integrate with.
       | 
       | I wrote a custom search engine that worked by running on cron,
       | pulling in all of the content from Confluence and writing it into
       | a SQLite table with SQLite full-text search enabled (using
       | https://sqlite-utils.datasette.io/en/stable/python-api.html#...),
       | then sticking a https://datasette.io/ interface in front of it.
        
         | dangoor wrote:
         | It seems to me that the big problem with Confluence search
         | (once you have a lot of pages) is that the results have poor
         | relevance ranking. Wouldn't tossing the content into SQLite
         | have the same problem?
        
         | bartread wrote:
         | On one level that's great and I'm certainly glad you made it
         | work.
         | 
         | On another level, and bearing in mind that Confluence is a paid
         | product, this absolutely should not be necessary and competent
         | search is something that Atlassian should provide out of the
         | box.
         | 
         | (Yes, I have beef with Confluence, but in my case it's
         | primarily due to the historically awful editing experience.)
        
         | CodeAndCuffs wrote:
         | The API for writing docs/content to confluence is the worst
         | i've ever seen. You are expected to use their custom syntax
         | which then gets converted again before rendering.
         | 
         | The docs for the POST content literally says to write what you
         | want in confluences WYSIWYG, then do a GET API call to see what
         | it should look like.
        
       | irvingprime wrote:
       | Compared to jira search Confluence search is quite good.
        
       | sideproject wrote:
       | I use BitBucket, because it's free and I've been using it for a
       | long time. Maybe GitHub is faster, but I don't access BitBucket
       | enough to justify migrating ~50 repos I have. Can't be bothered.
       | Its UI/UX? meh. I got used to it.
       | 
       | I use Confluence and Jira because, again, we use them at work. So
       | I guess I'm using them because I have to. I also understand it's
       | a pain to move our company from one to another (oh we've had
       | discussions to move to Coda and others) but again, I'm not taking
       | on that project. Again, UI/UX, search - all meh - they are
       | working and I got used to it.
       | 
       | The inconvenience of using them does not justify the amount of
       | time I need to spend to overcome my inconvenience. Some things,
       | you just have to let them slide.
        
       | nitwit005 wrote:
       | I don't think it's unusually bad. Rather, if an app offers open
       | ended search, it will generally generate fairly poor results.
        
         | pornel wrote:
         | No, it really is exceptionally bad even among half-assed search
         | implementations.
         | 
         | For a start, it interprets multiple words in a query as an OR.
         | You search for a "hello world", you get "hello nobody" and
         | "goodbye world" and the search results.
         | 
         | It also always applies stemming, which mangles technical terms.
         | At Cloudflare we have a daemon called "cloudflared" and it's
         | impossible to find it in the damn wiki.
         | 
         | If it even tries to do any prioritization, it's
         | indistinguishable from random. I search for a project's name, I
         | get fragment of meeting notes from 7 years ago, not the
         | project's homepage.
         | 
         | And the UI is unusably awful too. The fancy-ajaxy JS overlay
         | breaks the Back button, so if you click on an irrelevant result
         | (and all of them are irrelevant), pressing back doesn't go back
         | to search results, but instead makes you lose document you were
         | on.
        
           | boyter wrote:
           | If possible please try this
           | https://marketplace.atlassian.com/apps/1225034/better-
           | instan... and let me know how it goes for you. No stemming
           | applied, no term expansion etc... The back button issue
           | exists (not sure if possible to fix that as a plugin), but id
           | suggest opening results in a new tab to solve that issue.
        
       | Cryptonic wrote:
       | Yes it only finds you crap results. Not sure why they have the
       | most naive search algorithm out there. Maybe good search needs
       | more AI and CPU power than we think.
       | 
       | Maybe this is something google should take on. A search plugin
       | for Confluence where google crawlers logs in from time to time
       | for internal crawling to enable non-public teach request on that
       | data. That boost knowledge workers efficiency a lot. I hope
       | somebody from Google reads this and takes on the challenge. I'm
       | sure companies would pay a lot for this.
        
         | leetrout wrote:
         | This is a thing that exists already for Google Cloud Search
         | 
         | https://workspace.google.com/products/cloud-search/
         | 
         | https://marketplace.atlassian.com/apps/1212945/google-cloud-...
        
       | deevin9 wrote:
       | My company uses Coveo [www.coveo.com] for their intranet. They
       | have a native connector for Confluence, it works MUCH better:
       | https://docs.coveo.com/en/1716/index-content/install-the-cov...
        
       | staplung wrote:
       | It's been a long time since I worked at Google but when I did (10
       | yrs ago), the search system for the intranet was notoriously
       | awful. Part of the reason was that PageRank tends not to work so
       | well in places where things aren't heavily cross-linked, which is
       | a hard place to get to if you search system already sucks.
        
         | modeless wrote:
         | I always found those complaints funny. Google's internal search
         | was and is light years ahead of every other company's. Those
         | complaints were probably coming from people who never worked at
         | any other large company and were expecting internal search to
         | be as good as web search despite the relatively tiny corpus.
        
       | dmpanch wrote:
       | We are using Confluence for public and internal wiki, it has a
       | bad search and really slow, but no matter how much everyone hates
       | it, the market does not provide worthy alternatives.
       | 
       | When choosing 3 years ago, we used the following criteria:
       | 
       | * WYSIWYG editor. Any user must have a minimum effort to write
       | documentation
       | 
       | * Flexible access permissions to various parts of the
       | documentation. Public documentation is open to anonymous users,
       | the internal one is divided into many sections with access for
       | certain groups
       | 
       | * Multilingual support. Not out of the box, but possible with
       | plugins
       | 
       | * Multilingual pdf export. In some markets, some customers prefer
       | to have exported manuals
       | 
       | * The ability to inherit articles. We need to be able to make
       | edits once, instead of duplicating the same articles
       | 
       | * Have a relatively modern appearance. Wiki engines are familiar
       | to many because the whole world uses Wikipedia, but this does not
       | make them more pleasing to the eyes, if I can say so
       | 
       | 3 years have passed, I periodically look at alternatives, so far
       | only wiki.js seems like a good solution but it's not even close
       | yet.
        
         | jfrunyon wrote:
         | > the market does not provide worthy alternatives.
         | 
         | MediaWiki?
        
       | marcodiego wrote:
       | Let's stop asking "why closed feature in closed product works so
       | bad?" type of questions. The only appropriate answer is: because
       | costumers continue to use it.
        
         | josephcsible wrote:
         | > costumers continue to use it
         | 
         | The people who make the decision to buy Confluence aren't the
         | ones who have to use it.
        
       | BuyMyBitcoins wrote:
       | On a confluence that covers the whole of the Fortune 500 company
       | I work for, I do NOT want to search over the corpus of _all_ the
       | documents hosted on it. I want a persistent search filter where I
       | can easily restrict my results within certain parameters without
       | having to constantly re-filter my results.
       | 
       | I think most search engine designers want to make the index as
       | broad as possible, but the problem seems to be that people
       | _rarely_ want such broad searches. What they really want are very
       | detailed indices and metadata implications over well trodden
       | folders.
        
       ___________________________________________________________________
       (page generated 2021-09-20 23:02 UTC)