hngopher.com

       [HN Gopher] Claude Integrations
       ___________________________________________________________________
        
       Claude Integrations
        
       Author : bryanh
       Score  : 690 points
       Date   : 2025-05-01 16:02 UTC (1 days ago)
        
 (HTM) web link (www.anthropic.com)
 (TXT) w3m dump (www.anthropic.com)
        
       | behnamoh wrote:
       | That "Allow for this chat" pop up should be optional. It ruins
       | the entire MCP experience. Maybe make it automatic for non-
       | mutating MCP tools.
        
         | pcwelder wrote:
         | In the latest update they've replaced "Allow for this chat"
         | with "Always Allow".
        
           | avandekleut wrote:
           | MCP also has support for "hints" which note whether an action
           | is destructive.
        
       | arjie wrote:
       | The cookie banner type constant Allow Allow Allow makes their
       | client unusable. Are there any alternative desktop MCP clients?
        
         | rahimnathwani wrote:
         | https://github.com/patruff/ollama-mcp-bridge
        
       | jarbus wrote:
       | Anyone have any data on how effective models are at leveraging
       | MCP? Hard to tell if these things are a buggy mess or a game
       | changer
        
         | striking wrote:
         | Claude Code is doing pretty well in my experience :) I've built
         | a tool in our CI environment that reads Jira tickets, files
         | GitHub PRs, etc. automatically. Great for one-shotting bugs,
         | and it's only getting better.
        
       | xnx wrote:
       | Integrations are nice, but the superpower is having an AI smart
       | enough to operate a computer/keyboard/mouse so it can do anything
       | without the cooperation/consent of the service being used.
       | 
       | Lots of people are making moves in this space (including
       | Anthropic), but nothing has broken through to the mainstream.
        
         | WillAdams wrote:
         | Or even access multiple files?
         | 
         | Why can't one set up a prompt, test it against a file, then
         | once it is working, apply it to each file in a folder in a
         | batch process which then provides the output as a single
         | collective file?
        
           | xnx wrote:
           | You can probably achieve what you want with
           | https://github.com/simonw/llm and a little bit of command
           | line.
           | 
           | Not sure what OS you're on, but in Windows it might look like
           | this:
           | 
           | FOR %%F IN (*.txt) DO (TYPE "%%F" | llm -s "execute this
           | prompt" >> "output.txt)
        
             | WillAdams wrote:
             | I want to work with PDFs (or JPEGs), but that should be a
             | start, I hope.
        
               | xnx wrote:
               | llm supports attachments too
               | 
               | FOR %%F IN (*.pdf) DO (llm -a %%F -s "execute this
               | prompt" >> output.txt)
        
           | TheOtherHobbes wrote:
           | I've just done something similar with Claude Desktop and its
           | built-in MCP servers.
           | 
           | The limits are still buggy responses - Claude often gets
           | stuck in a useless loop if you overfeed it with files - and
           | lack of consistency. Sometimes hand-holding is needed to get
           | the result you want. And it's slow.
           | 
           | But when it works it's amazing. If the issues and limitations
           | were solved, this would be a complete game changer.
           | 
           | We're starting to get somewhat self-generating automation and
           | complex agenting, with access to all of the world's public
           | APIs and search resources, controlled by natural language.
           | 
           | I can't see the edges of what could be possible with this.
           | It's limited and clunky for now, but the potential is
           | astonishing - at least as radical an invention as the web
           | was.
        
             | WillAdams wrote:
             | I would be fine with storing the output from one run,
             | spooling up a new one, then concatenating after multiple
             | successive runs.
        
           | pglevy wrote:
           | I've been using Claude Desktop with built-in File MCP to run
           | operations on local files. Sometimes it will do things
           | directly but usually it will write a Python script. For
           | example: combine multiple.md files into one or organize
           | photos into folders.
           | 
           | I also use this method for doing code prototyping by giving
           | it the path to files in the local working copy of my repo.
           | Really cool to see it make changes in a vite project and it
           | just hot reloads. Then I make tweaks or commit changes as
           | usual.
        
         | arnaudsm wrote:
         | I get often ratelimited or blocked from websites because I
         | browse them too fast with my keyboard and mouse. The AI would
         | be slowed down significantly.
         | 
         | LLM-desktop interfaces make great demos, but they are too slow
         | to be usable in practice.
        
           | xnx wrote:
           | Good point. Probably makes sense to think of it as an
           | assistant you assign a job to and get results back later.
        
       | boh wrote:
       | I think all the retail LLM's are working to broaden the available
       | context, but in most practical use-cases it's having the ability
       | to minimize and filter the context that would produce the most
       | value. Even a single PDF with too many similar datapoints leads
       | to confusion in output. They need to switch gears from the high
       | growth, "every thing is possible and available" narrative, to one
       | that narrows the scope. The "hallucination" gap is widening with
       | more context, not shrinking.
        
         | mikepurvis wrote:
         | That's a tough pill to swallow when your company valuation is a
         | $62B based on the premise that you're building a bot capable of
         | transcendent thought, ready to disrupt every vertical in
         | existence.
         | 
         | Tackling individual use-cases is supposed to be something for
         | third party "ecosystem" companies to go after, not the
         | mothership itself.
        
         | Etheryte wrote:
         | This has been my experience as well. The moment you turn
         | internet access on, Kagi Assistant starts outputting garbage.
         | Turn it off and you're all good.
        
         | fhd2 wrote:
         | Definitely my experience. I manage context like a hawk, be it
         | with Claude-as-Google-replacement or LLM integrations into
         | systems. Too little and the results are off. Too much and the
         | results are off.
         | 
         | Not sure what Anthropic and co can do about that, but
         | integrations feel like a step in the wrong direction. Whenever
         | I've tried tool use, it was orders of magnitude more expensive
         | and generally inferior to a simple model call with curated
         | context from SerpApi and such.
        
           | loufe wrote:
           | Couldn't agree more. I wish all major model makers would
           | build tools into their proprietary UIs to "summarize contents
           | and start a new conversation with that base". My biggest
           | slowdown with working with LLMs while coding is moving my
           | conversation to a new thread because context limit is hit
           | (Claude) or the coherent-thought threshold is exceeded
           | (Gemini).
        
             | fhd2 wrote:
             | I never use any web interfaces, just hooked up gptel (an
             | Emacs package) to Claude's API and a few others I regularly
             | use, and I just have a buffer with the entire conversation.
             | I can modify it as needed, spawn a fresh one quickly etc.
             | There's also features to add files and individual snippets,
             | but I usually manage it all in a single buffer. It's a
             | powerful text editor, so efficient text editing is a given.
             | 
             | I bet there are better / less arcane tools, but I think
             | powerful and fast mechanisms for managing context are key
             | and for me, that's really just powerful text editing
             | features.
        
         | medhir wrote:
         | you hit the nail on the head. my experience with prompting LLMs
         | is that providing extra context that isn't explicitly needed
         | leads to "distracted" outputs
        
         | ketzo wrote:
         | I mean, to be honest, they gotta do both to achieve what
         | they're aiming for.
         | 
         | A truly useful AI assistant has context on my last 100,000
         | emails - and also recalls the details of each individual one
         | perfectly, without confusion or hallucination.
         | 
         | Obviously I'm setting a high bar here; I guess what I'm saying
         | is "yes, and"
        
         | energy123 wrote:
         | There's a niche for the kitchen sink approach. It's a type of
         | search engine.
         | 
         | Throw in all context --> ask it what is important for problem
         | XYZ --> curate what it tells you, and feed that to another
         | model to actually solve XYZ
        
         | roordan wrote:
         | This is my concern as well. How successful is it in selecting
         | the correct tool out of hundreds or thousands?
         | 
         | Different to what this integration is pushing, the LLMs usage
         | in production based products where high accuracy is a
         | requirement (99%), you have to give a very limited tool set to
         | get any degree of success.
        
       | bredren wrote:
       | Had been planning a custom mcp for our orgs' jira.
       | 
       | I'm a bit skeptical that it's gonna work out of the box because
       | of the amount of custom fields that seem to be involved to make
       | successful API requests in our case.
       | 
       | But I would welcome, not having to solve this problem. Jira's
       | interface is among the worst of all the ticket tracking
       | applications I have encountered.
       | 
       | But, I have found using a LM conversation paired within enough
       | context about what is involved for successful POSTs against the
       | API allow me to create update and relate issues via curl.
       | 
       | It's begging for a chat based LLM solution like this. I'd just
       | prefer the underlying model not be locked to a vendor.
       | 
       | Atlassian should be solving this for its customers.
        
         | viraptor wrote:
         | You can also do the same thing locally:
         | https://github.com/sooperset/mcp-atlassian Either with the
         | cloude app, or some other system with any tool-using LLM you
         | want.
        
           | bredren wrote:
           | I'm familiar with that MCP and was planning to build on top
           | of it.
           | 
           | I hadn't realized but the new integration seems to actually
           | just be an official, closed-source MCP produced *by*
           | Atlassian.
           | 
           | sooperset's MCP is MIT licensed, so I wonder how much of the
           | Atlassian edition is just a lift of that.
           | 
           | There's a comment [1] on the actual integration page asking
           | about custom fields, which I think is possibly a big issue.
           | 
           | At first I thought the open-source version would get crushed
           | by an actual Atlassian release, but not if Atlassian doesn't
           | offer all the support for it to work really well no matter
           | what customizations are fitted into each instance.
           | 
           | My hypothesis is that it takes custom code to make this work,
           | and using the off-the-shelf for Jira won't work. Hoping to be
           | proven wrong though, as it would be less work for me on that
           | front.
           | 
           | [1] https://community.atlassian.com/forums/Atlassian-
           | Platform-ar...
        
       | rubenfiszel wrote:
       | I feel dumb but how do you actually add Zapier or Confluence or
       | custom MCP on the web version of claude? I only see it for
       | Drive/Gmail/Github. Is it zoned/slow release?
        
         | throwaway314155 wrote:
         | edit: <Incorrect>im fairly certain these additions only work on
         | Claude Desktop?</Incorrect>
         | 
         | That or they're pulling an OpenAI and launching a feature that
         | isn't actually fully live.
        
           | rubenfiszel wrote:
           | But the videos show claude web
        
         | 85392_school wrote:
         | This part seems relevant:
         | 
         | > in beta on the Max, Team, and Enterprise plans, and will soon
         | be available on Pro
        
       | joshwarwick15 wrote:
       | Created a list of remote MCP servers here so people can keep
       | track of new releases - https://github.com/jaw9c/awesome-remote-
       | mcp-servers
        
       | zhyder wrote:
       | Is there any way to access this via the API, after perhaps some
       | oauth from the Anthropic user account?
        
       | throwup238 wrote:
       | The leap frogging at this point is getting insane (in a good way,
       | I guess?). The amount of time each state of the art feature gets
       | before it's supplanted is a few weeks at this point.
       | 
       | LLMs were always a fun novelty for me until OpenAI DeepResearch
       | which started to actually come up with useful results on more
       | complex programming questions (where I needed to write all the
       | code by hand but had to pull together lots of different libraries
       | and APIs), but it was limited to 10/month for the cheaper plan.
       | Then Google Deep Research upgraded to 2.5 Pro and with paid usage
       | limits of 20/day, which allowed me to just throw everything at it
       | to the point where I'm still working through reports that are a
       | week or more old. Oh and it searched up to 400 sources at a time,
       | significantly more than OpenAI which made it quite useful in
       | historical research like identifying first edition copies of
       | books.
       | 
       | Now Claude is releasing the same research feature with
       | integrations (excited to check out the Cloudflare MCP auth
       | solution and hoping Val.town gets something similar), and a run
       | time of up to 45 minutes. The pace of change was overwhelming
       | half a year ago, now it's just getting ridiculous.
        
         | user_7832 wrote:
         | I agree with your overall message - rapid growth appears to
         | encourage competition and forces companies to put their best
         | foot forward.
         | 
         | However, unfortunately, I cannot shower much praise on Claude
         | 3.7. And if you (or anyone) asks why - 3.7 seems much better
         | than 3.5, surely? - Then I'm moderately sure that you use
         | Claude much more for coding than for any kind of conversation.
         | In my opinion, even 3.5 _Haiku_ (which is available for free
         | during high loads) is better than 3.7 Sonnet.
         | 
         | Here's a simple test. Try asking 3.7 to intuitively explain
         | anything technical - say, mass dominated vs spring dominated
         | oscillations. I'm a mechanical engineer who studied this stuff
         | and _I_ could not understand 3.7's analogies.
         | 
         | I understand that coders are the largest single group of
         | Claude's users, but Claude went from being my most used app to
         | being used only after both chatgpt and Gemini, something that I
         | absolutely regret.
        
           | airstrike wrote:
           | I too like 3.5 better than 3.7 and I use it pretty often.
           | It's like 3.7 is better in 2 metrics but worse in 10
           | different ones
        
           | joshstrange wrote:
           | I use Claude mostly for coding/technical things and something
           | about 3.7 does not feel like an upgrade. I haven't gone back
           | to 3.5 (mostly started using Gemini Pro 2.5 instead).
           | 
           | I haven't been able to use Claude research yet (it's not
           | rolled out to the Pro tier) but o1 -> o3 deep research was a
           | massive jump IMHO. It still isn't perfect but o1 would often
           | give me trash results but o3 deep research actually starts to
           | be useful.
           | 
           | 3.5->3.7 (even with extended thinking) felt like a
           | nothingburger.
        
           | mattlutze wrote:
           | The expectation that one model be top marks for all things
           | is, imo, asking too much.
        
           | tiberriver256 wrote:
           | 3.7 did score higher in coding benchmarks but in practice 3.5
           | is much better at coding. 3.7 ignores instructions and does
           | things you didn't ask it to do.
        
             | UncleEntity wrote:
             | I think it just does that to eat up your token quota and
             | get you to upgrade.
             | 
             | Like, ask it a simple question and it comes up with a full
             | repo, complete with a README and a Makefile, when all you
             | wanted to know was how efficient a particular algorithm
             | would be in the included code.
             | 
             | Can't wait until the add research to the Pro plan because,
             | you know, I have questions...
        
               | vineyardmike wrote:
               | > I think it just does that to eat up your token quota
               | and get you to upgrade.
               | 
               | If you pay for a subscription then they don't have an
               | incentive to use more tokens for the same answer.
               | 
               | It's definitely because feedback from people has "taught"
               | it that more boilerplate is better. It's the same reason
               | ChatGPT is annoyingly complementary.
        
             | spaceman_2020 wrote:
             | 3.7 is too overactive
             | 
             | I prefer Gemini 2.5 pro for all code now
        
               | conception wrote:
               | 2.5 is my "okay Claude can't get it" but first I check my
               | "bank account" to see if I can afford it.
        
               | ralusek wrote:
               | Isn't 2.5 pro significantly cheaper?
        
               | yunwal wrote:
               | They're the same price, and Gemini has a large free tier.
        
               | hombre_fatal wrote:
               | Gemini 2.5 Pro has solved problems that Claude 3.7
               | cannot, so I use it for the hard stuff.
               | 
               | But Gemini is at least as overactive as Claude, sometimes
               | even more overactive when it comes to something like
               | comment spam.
               | 
               | Of course, this can be fixed with prompting. And
               | sometimes it feels sheepish complaining about the machine
               | god doing most of my chore work that didn't even exist a
               | couple years ago.
        
             | suyash wrote:
             | That has been the most annoying thing about it, so glad not
             | paying for it anymore.
        
               | danw1979 wrote:
               | Can't you still use Sonnet 3.5 anyway ? or is that a
               | paying subscriber feature only ?
        
             | sannee wrote:
             | I suspect that is precisely why it got better at coding
             | benchmarks.
        
           | garrickvanburen wrote:
           | My current hypothesis: the more familiar you are with a topic
           | the worse the results from any LLM.
        
             | jeswin wrote:
             | > My current hypothesis: the more familiar you are with a
             | topic the worse the results from any LLM.
             | 
             | That's not really true, since your prompts are also getting
             | better. Better input leads to better output remains true,
             | even with LLMs (when you see it as a tool).
        
               | franga2000 wrote:
               | Being more familiar with the topic definitely doesn't
               | always make your prompts better. For a lot of things it
               | doesn't really change (explain X, compare X and Y...) -
               | and this is what is being discussed it. For giving
               | "building" instructions (like writing code) it helps a
               | bit, but even if you know exactly what you want it to
               | write, getting it to do that is pretty much trial and
               | errror (too much detail makes it follow word-for-word and
               | produce bad code, too little and it misses important
               | parts or makes dumb mistakes).
        
               | jm547ster wrote:
               | The opposite may be true, the more effective the model
               | the lazier the prompting as it can seemingly handle not
               | being micromanaged as with earlier versions.
        
             | mac-mc wrote:
             | He was saying that 3.5 is better than 3.7 on the same topic
             | he knows well tho.
        
             | user_7832 wrote:
             | That is certainly the case in niche topics where published
             | information is lacking, or needs common sense to synthesize
             | proper outputs [1].
             | 
             | However in this specific example, I don't remember if it
             | was chatgpt or gemini or 3.5 Haiku but the other(s)
             | explained it well enough. I think I re-asked 3.5 Haiku at a
             | later point of time, and to my complete non-surprise, it
             | gave an answer that was quite decent.
             | 
             | 1 - For example, the field of DIY audio - which was funnily
             | enough the source of my question. I'm no speaker designer,
             | but combining creativity with engineering basics/rules of
             | thumb seems to be something LLms struggle with terribly.
             | Ask them to design a speaker and they come up with the most
             | vanilla, tired, textbook design - despite several existing
             | market products that are already so much ahead/innovative.
             | 
             | I'm confident that if you asked an LLM an identical
             | question for which there _is_ more discourse - eg make an
             | interesting /innovative phone - you'd get relatively much
             | better results.
        
               | terminalcommand wrote:
               | I built open baffle speakers based on measurements and
               | discussion I had with Claude. I think it is really good.
               | 
               | I am a novice, maybe that's why I liked it.
        
             | danw1979 wrote:
             | Amen to this. As soon as you ask an LLM to explain
             | something in detail that you're a domain expert in, that's
             | when you notice the flaws.
        
               | startupsfail wrote:
               | Yes, it's particularly bad when the information found on
               | the web is flawed.
               | 
               | For example, I'm not a domain expert, but I was looking
               | for an RC motor for a toy project and OpenAI had happily
               | tried to source a few, with Deep Research. Only the best
               | candidate it had picked contained an obvious typo in the
               | motor spec (68 grams instead of 680 grams), which is just
               | impossible for a motor of specified dimensions.
        
               | parineum wrote:
               | > Yes, it's particularly bad when the information found
               | on the web is flawed.
               | 
               | It's funny you say that because I was going to echo your
               | parents sentiment and point out it's exactly the same
               | with any news article you read.
               | 
               | The majority if content these LLMs are consuming is not
               | from domain experts.
        
               | 91bananas wrote:
               | I had it generate a baseball lineup the other day, it
               | printed out a list of the 13 kids names, then said (12
               | players). Just straight up miscounted what it was doing,
               | throwing a wrench to everything else it was doing beyond
               | that point.
        
             | eru wrote:
             | Not really. I'm getting pretty good Computer Science theory
             | out of Gemini and even ChatGPT.
        
             | bsenftner wrote:
             | It is like this with expert humans too. Which is why, no
             | matter what, we will continue to _require_ expert humans
             | not just  "in the loop" but as the critical cogs that are
             | the loop itself, just as it as always been. However, this
             | time around those people will have AI augmentation, and be
             | intellectually athletes of a nature our civilization has
             | never seen.
        
             | simsla wrote:
             | I always tell people to trust the LLM to the same extent as
             | an intern. Avoid giving it tasks you cannot verify the
             | correctness of.
        
           | fastball wrote:
           | Seems clear to me that Claude 3.7 suffers from overfitting,
           | probably due to Anthropic seeing that 3.5 was a smash hit in
           | the LLM coding space and deciding their North star for 3.7
           | should be coding benchmarks (which, like all benchmarks, do
           | not properly capture the process of real-world coding).
           | 
           | If it was actually good they would've named it 4.0, the fact
           | that they went from 3.5 to 3.7 (weird jump) speaks volumes
           | imo.
        
             | snewman wrote:
             | The numbering jump is because there was "Claude 3.5" and
             | then "Claude 3.5 (new)" and they decided to retroactively
             | stop the madness and rename the later to 3.6 (which is what
             | everyone was calling it anyway).
        
           | csomar wrote:
           | Plateauing overall but apparently you can gain in certain
           | directions while you lose on some. I've written an article a
           | while back that current models are not that far from GPT-3.5:
           | https://omarabid.com/gpt3-now
           | 
           | 3.7 is definitively better at coding but you feel it lost a
           | bit of maneuverability at other domains. For someone who
           | wants code generated, it doesn't matter but I've found myself
           | using DeepSeek first and then getting code output by 3.7.
        
         | ilrwbwrkhv wrote:
         | None of those reports are any good though. Maybe for shallow
         | research, but I haven't found them deep. Can you share what
         | kind of research you have been trying there where it has done a
         | great job of actual deep research.
        
           | Balgair wrote:
           | I'm echoing this sentiment.
           | 
           | Deep Research hasn't really been that good for me. Maybe I'm
           | just using it wrong?
           | 
           | Example: I want the precipitation in mm and monthly high and
           | low temperature in C for the top 250 most populous cities in
           | North America.
           | 
           | To me, this prompt seems like a pretty anodyne and obvious
           | task for Deep Research. It's long, tedious, but mostly coming
           | from well structured data sources (wikipedia) across two
           | languages at most.
           | 
           | But when I put this in to any of the various models, I mostly
           | get back ways to go and find that data myself. Like, I know
           | how to look at Wikipedia, it's that I don't want to comb
           | through 250 pages manually or try to write a script to handle
           | all the HTML boxes. I want the LLM/model to do this days long
           | tedious task for me.
        
             | 85392_school wrote:
             | The funny thing is that if your request only needed the top
             | 100's temperature or the top 33's precipitation, it could
             | just read "List of cities by average temperature" or "List
             | of cities by average precipitation" and that would be it,
             | but the top 250 requires reading 184x more pages.
             | 
             | My perspective on this is that if Deep Research can't do
             | something, you should do it yourself and put the results on
             | the internet. It'll help other humans and AIs trying to do
             | the same task.
        
               | Balgair wrote:
               | Yeah, that was intentional, well, somewhat.
               | 
               | The project requires the full list of every known city in
               | the western hemisphere and also Japan, Korea, and Taiwan.
               | But that dataset is just maddeningly large, if it is
               | possible at all. Like, I expect it to take me years, as I
               | have to do a lot of translations. So, I figured that I'd
               | be nice and just as for the top 250 for the various
               | models.
               | 
               | There's a lot more data that we're trying to get too and
               | I'm hoping that I can get approval to post it as its a
               | work thing.
        
               | wyre wrote:
               | If you have the data, but need to parse all of it,
               | couldn't you upload it to your LLM of choice (with a
               | large enough context window) and have it finish your
               | project?
        
               | XenophileJKO wrote:
               | Well remember listing/ranking things are structurally
               | hard for these models because you have to keep track of
               | what it has listed and what it hasn't, etc.
        
               | Balgair wrote:
               | I'm sorry I was unclear. No, I do not have the data yet
               | and I need to get it.
        
               | therein wrote:
               | Sounds like the you're having it conduct research and
               | then solve the Knapsack problem for you on the collected
               | data. We should do the same for the traveling salesman
               | one.
               | 
               | How do you validate its results in that scenario? Just
               | take its word for it?
        
               | Balgair wrote:
               | Ahh, no. We'll be doing more research on the data once we
               | have it. Things like ranking and averages and
               | distributions on the data will come later, but first we
               | just need it to begin with.
        
             | sxg wrote:
             | That's actually not what deep research is for, although you
             | can obviously use it however you like. Your query is just
             | raw data collection--not research. Deep research is about
             | exploring a topic primarily with academic and other high-
             | quality sources. It's a starting point for your own
             | research. Deep research creates a summary report in ~10 min
             | from more sources than you could probably read in a month,
             | and then you can steer the conversation from there.
             | Alternatively, you can just use deep research's sources as
             | a reading list for yourself so you can do your own
             | analysis.
        
               | Balgair wrote:
               | I think we have very different definitions of the word
               | 'research' then.
               | 
               | I'd say that what you're saying is 'synthesis'. The
               | 'Intro/Discussion' sections of a journal article.
               | 
               | For me, 'research' means the work of going through and
               | getting all the data in the first place. Like, going out
               | and collecting dino bones in the hot sun, measuring all
               | the soil samples, etc. - that is research. For me, asking
               | these models to go collate some webpages, I mean, you
               | spend the first weeks of a summer undergrad's time to go
               | do this kid of thing to get them used to the file systems
               | and spruce up their organization skills, see where they
               | are at. Writing the paper up, that's part of research
               | sure, but not the hard part that really matters.
        
               | sxg wrote:
               | Agreed--we're working with different definitions of
               | "research". The deep research products from OpenAI,
               | Google Gemini, and Perplexity seem to be more aligned
               | with my definition of research if that helps you gain
               | more utility from them.
        
               | tomrod wrote:
               | It's excellent at producing short literature reviews on
               | open access papers and data. It has no sense of judgment,
               | trusting most sources unless instructed otherwise.
        
               | fakedang wrote:
               | Gemini's Deep Research is very good at discriminating
               | between sources though, in my experience (haven't tried
               | Claude or Perplexity). It finds really obscure but very
               | relevant documents that don't even show up in Google
               | Search for the same queries. It also discounts results
               | that are otherwise irrelevant or very low-value from the
               | final report. But again, it is just a starting point as
               | the generated report is too short, and I make sure to
               | check all the references it gives once again. But that's
               | where I find its value.
        
             | spaceman_2020 wrote:
             | My wife, who is writing her PhD right now and teaches
             | undergraduate students, says they are at the level of a
             | really bright final year undergrad
             | 
             | Maybe in a year, they'll hit the graduate level. But we're
             | not near PhD level yet
        
           | xrdegen wrote:
           | It is because you are just such a genius that already knows
           | everything unlike us stupid people that find these tools
           | amazingly useful and informative.
        
             | cwillu wrote:
             | The failure mode is that people unfamiliar with a subject
             | aren't able to distinguish careful analysis from bullshit.
             | However the second failure mode where someone pointing that
             | out is assumed to be calling people stupid is a
             | longstanding wetware bug.
        
         | greymalik wrote:
         | Out of curiosity - can you give any examples of the programming
         | questions you are using deep research on? I'm having a hard
         | time thinking of how it would be helpful and could use the
         | inspiration.
        
           | dimitri-vs wrote:
           | Easy, any research task that will take you 5 minutes to
           | complete it's worth firing off a Deep Research request while
           | you work on something else in parallel.
           | 
           | I use it a lot when documentation is vague or outdated. When
           | Gemini/o3 can't figure something out after 2 tries. When I am
           | working with a service/API/framework/whatever that I am very
           | unfamiliar with and I don't even know what to Google search.
        
             | jerpint wrote:
             | Have you tried using llms.txt when available? Very useful
             | resource
        
           | emorning3 wrote:
           | I often use Chrome to valid what I think I know.
           | 
           | I recently asked Chrome to show me how to apply the Knuth-
           | Bendix completion procedure to propositional logic, and I had
           | already formed my own thoughts about how to proceed (I'm
           | building a rewrite system that does automated reasoning).
           | 
           | The response convinced me that I'm not a total idiot.
           | 
           | I'm not an academic and I'm often wrong about theory so the
           | validation is really useful to me.
        
             | scargrillo wrote:
             | That's a perfect example of LLMs providing epistemic
             | scaffolding -- not just giving you answers, but helping you
             | check your footing as you explore unfamiliar territory.
             | Especially valuable when you're reasoning through something
             | structurally complex like rewrite systems or proof
             | strategies. Sometimes just seeing your internal model
             | reflected back (or gently corrected) is enough to keep you
             | moving.
        
             | miki_oomiri wrote:
             | "Chrome" ? What do you mean? Gemini?
        
         | risyachka wrote:
         | What are you talking about
         | 
         | It is literally stagnated for a year now
         | 
         | All that changed is they connect more apis.
         | 
         | And add a thinking loop with same model powering it
         | 
         | This is the reason it seems fast - nothing really happens
         | except easy things
        
           | tymscar wrote:
           | I totally agree with you, especially if you actually try
           | using these models, not just looking at random hype posters
           | on twitter or skewed benchmarks.
           | 
           | That being said, isn't it strange how the community has polar
           | opposite views about this? Did anything like this ever happen
           | before?
        
         | itissid wrote:
         | I've been using it for pre scoping things I have no idea about
         | and rapidly iterating by refeeding it a version with guard
         | rails and conditions from previous chats.
         | 
         | Like I wanted to scope how to build a home made TrueNAS Scale
         | unit, it helped me with a avoiding pitfalls like knowing that I
         | needed two GPUs minimum to run the OS and local llms, and speed
         | up config for a CLI back up of my Dropbox locally(it told me to
         | use the right filesystem format over ZFS to make Dropbox client
         | work).
         | 
         | It has researched how I can structure my web app for building
         | payment system on the web(something I knew nothing about) to
         | writing small tools to talk to my document collection and index
         | them into collections in Anki in one day.
        
         | wilg wrote:
         | o3 since it can web search while reasoning is a really useful
         | lighter weight deep research
        
         | spaceman_2020 wrote:
         | Gemini 2.5 pro was the moment for me where I really thought
         | "this is where true adoption happens"
         | 
         | All those talks about AI replacing people seemed a little far
         | fetched in 2024. But in 2025, I really think models are getting
         | good enough
        
           | antupis wrote:
           | You still need "human in the loop" because with simple tasks
           | or some tasks that have lots of training material, models can
           | one-shot answer and are like super good. But if the domain
           | grows too complex, there are some not-so-obvious
           | dependencies, or stuff that is in bleeding edge. Models fail
           | pretty badly. So you need someone to split those complex
           | tasks to more simpler familiar steps.
        
         | iLoveOncall wrote:
         | Calling some APIs is leap-frogging? You could do this with
         | GPT-3, nothing has changed except it's branded under a new name
         | and tries to establish a (flawed) standard.
         | 
         | If there was truly any innovation still happening in OpenAI,
         | Anthropic, etc., they would be working on models only, not on
         | side features that someone could already develop over a
         | weekend.
        
           | never_inline wrote:
           | Why would you love on-call though?
        
             | iLoveOncall wrote:
             | In my previous team most of our oncall requests came from
             | bug reports by customers on various tools that we owned, so
             | to be able to work on random tools that my team owned was a
             | nice change of pace / scenery compared to working on the
             | same thing for 3 months uninterrupted.
             | 
             | Now I'm in a new team where 99% of our oncall tickets come
             | from automated alarms and 80% of them are a subset of a few
             | issues where the root-cause isn't easy to address but there
             | is either nothing to actually do once investigated, or the
             | fix is a one time process that is annoying to run, so the
             | username isn't accurate anymore :)
             | 
             | I still like the change of pace though, 0 worries about
             | sprint tasks or anything else for a week every few months.
        
         | apwell23 wrote:
         | > DeepResearch which started to actually come up with useful
         | results on more complex programming questions
         | 
         | Is there a youtube video of ppl using this on complex open
         | source projects like linux kernel or maybe something like
         | pytorch.
         | 
         | How come none of the oss pojects( atleast not the ones i
         | follow) are progressing fast(er) from AI like 'deepresearch'
        
       | WhitneyLand wrote:
       | The integrations feel so rag-ish. It talks, tells you it's going
       | to use a tool, searches, talks about what it found...
       | 
       | Hope one day it will be practical to do nightly finetunes of a
       | model per company with all core corporate data stores.
       | 
       | This could create a seamless native model experience that knows
       | about (almost) everything you're doing.
        
         | pyryt wrote:
         | I would love to do this on my codebase after every commit
        
         | notgiorgi wrote:
         | why is finetuning talked about so much less than RAG? is it not
         | viable at all?
        
           | mring33621 wrote:
           | i'm not an expert in either, but RAG is like dropping some
           | 'useful' info into the prompt context, while fine tuning is
           | more like a performing mix of retraining, appending re-
           | interpretive model layers and/or brain surgery.
           | 
           | I'll leave it to you to guess which one is harder to do.
        
           | disgruntledphd2 wrote:
           | RAG is much cheaper to run.
        
           | computerex wrote:
           | It's significantly harder to get right, it's a very big
           | stepwise increase in technical complexity over in context
           | learning/rag.
           | 
           | There are now some light versions of fine tuning that don't
           | update all the model weights but train a small adapter layer
           | called Lora which is way more viable commercially atm in my
           | opinion.
        
           | ijk wrote:
           | There were initial difficulties in finetuning that made it
           | less appealing early on, and that's snowballed a bit into
           | having more of a focus on RAG.
           | 
           | Some of the issues still exist, of course:
           | 
           | * Finetuning takes time and compute; for one-off queries
           | using in-context learning is vastly more efficient (i.e.,
           | look it up with RAG).
           | 
           | * Early results with finetuning had trouble reliably
           | memorizing information. We've got a much better idea of how
           | to add information to a model now, though it takes more
           | training data.
           | 
           | * Full finetuning is very VRAM intensive; optimizations like
           | LoRA were initially good at transferring style and not
           | content. Today, LoRA content training is viable but requires
           | training code that supports it [1].
           | 
           | * If you need a very specific memorized result and it's
           | costly to get it wrong, good RAG is pretty much always going
           | to be more efficient, since it injects the exact text in
           | context. (Bad RAG makes the problem worse, of course).
           | 
           | * Finetuning requires more technical knowledge: you've got to
           | understand the hyperparameters, avoid underfitting and
           | overfitting, evaluate the results, etc.
           | 
           | * Finetuning requires more data. RAG works with a handful
           | datapoints; finetuning requires at least three orders of
           | magnitude more data.
           | 
           | * Finetuning requires extra effort to avoid forgetting what
           | the model already knows.
           | 
           | * RAG works pretty well when the task that you are trying to
           | perform is well-represented in the training data.
           | 
           | * RAG works when you don't have direct control over the model
           | (i.e., API use).
           | 
           | * You can't finetune most of the closed models.
           | 
           | * Big, general models have outperformed specialized models
           | over the past couple of years; if it doesn't work now, just
           | wait for OpenAI to make their next model better on your
           | particular task.
           | 
           | On the other hand:
           | 
           | * Finetuning generalizes better.
           | 
           | * Finetuning has more influence on token distribution.
           | 
           | * Finetuning is better at learning new tasks that aren't as
           | present in the pretraining data.
           | 
           | * Finetuning can change the style of output (e.g.,
           | instruction training).
           | 
           | * When finetuning pays off, it gives you a bigger moat (no
           | one else has that particular model).
           | 
           | * You control which tasks you are optimizing for, without
           | having to wait for other companies to maybe fix your problems
           | for you.
           | 
           | * You can run a much smaller, faster specialized model
           | because it's been optimized for your tasks.
           | 
           | * Finetuning + RAG outperforms just RAG. Not by a lot,
           | admittedly, but there's some advantages.
           | 
           | Plus the RL Training for reasoning has been demonstrating
           | unexpectedly effective improvements on relatively small
           | amounts of data & compute.
           | 
           | So there's reasons to do both, but the larger investment that
           | finetuning requires means that RAG has generally been more
           | popular. In general, the past couple of years have been won
           | by the bigger models scaling fast, but with finetuning
           | difficulty dropping there is a bit more reason to do your own
           | finetuning.
           | 
           | That said, for the moment the expertise + expense + time of
           | finetuning makes it a tough business proposition if you don't
           | have a very well-defined task to perform, a large dataset to
           | leverage, or other way to get an advantage over the multi-
           | billion dollar investment in the big models.
           | 
           | [1] https://unsloth.ai/blog/contpretraining
        
             | jimbokun wrote:
             | So is a good summary:
             | 
             | 1. If you have a large corpus of valuable data not
             | available to the corporations, you can benefit from fine
             | tuning using this data.
             | 
             | 2. Otherwise just use RAG.
        
             | msp26 wrote:
             | Thanks for the detailed comment.
             | 
             | I had no idea that fine tuning for adding information is
             | viable now. Last I checked (year+ back) it seemed to not
             | work well.
        
           | omneity wrote:
           | RAG is infinitely more accessible and cheaper than
           | finetuning. But it is true that finetuning is getting
           | severely overlooked in situations where it would outperform
           | alternatives like RAG.
        
             | riku_iki wrote:
             | > RAG is infinitely more accessible and cheaper than
             | finetuning.
             | 
             | it depends on your data access pattern. If some text goes
             | through LLM input many times, it is more efficient for LLM
             | to be finetuned on it once.
        
               | omneity wrote:
               | This assumes the team deploying the RAG-based solution
               | has equal ability to either engineer a RAG-based system
               | or to finetune an LLM. Those are different skillsets and
               | even selecting which LLM should be finetuned is a complex
               | question, let alone aligning it, deploying it, optimizing
               | inference etc.
               | 
               | The budget question comes into play as well. Even if text
               | is repetitively fed to the LLM, that might happen over a
               | long enough time compared to finetuning which is a sort
               | of capex that it is financially more accessible.
               | 
               | Now bear in mind, I'm a big proponent of finetuning where
               | applicable and I try to raise awareness to the
               | possibilities it opens. But one cannot deny RAG is a lot
               | more accessible to teams which are likely developers / AI
               | engineers compared to ML engineers/researchers.
        
               | riku_iki wrote:
               | > But one cannot deny RAG is a lot more accessible to
               | teams which are likely developers / AI engineers compared
               | to ML engineers/researchers.
               | 
               | It looks like major vendors provide simple API for fine-
               | tuning, so you don't need ML engineers/researchers:
               | https://platform.openai.com/docs/guides/fine-tuning
               | 
               | Setting RAG infra is likely more complicated than that.
        
               | omneity wrote:
               | You are certainly right, managed platforms make
               | finetuning much easier. But managed/closed model
               | finetuning is pretty limited and in fact should be named
               | "distribution modeling" or something.
               | 
               | Results with this method are significantly more limited
               | compared to all the power open-weight finetuning gives
               | you (and the skillset needed in return).
               | 
               | And in either case don't forget alignment and evals.
        
             | retinaros wrote:
             | fine tuning can cost 80$ and a few hours. a good rag doesnt
             | exist
        
             | never_inline wrote:
             | Can find tuning produce results as grounded as RAG?
             | 
             | How many epochs do you run?
        
           | onel wrote:
           | You usually fine tune when you want to add capabilities (an
           | output style, json output, function calling, etc). You use
           | RAG to add knowledge
        
       | VSerge wrote:
       | Ongoing demo of integrations with Claude by a bunch of A-list
       | companies: Linear, Stripe, Paypal, Intercom, etc.. It's live now
       | on: https://www.youtube.com/watch?v=njBGqr-BU54
       | 
       | In case the above link doesn't work later on, the page for this
       | demo day is here: https://demo-day.mcp.cloudflare.com/
        
       | mkagenius wrote:
       | are people really doing this mcp thing, yikes. Tomorrow, let me
       | reinvent css as model context design (mcd)
        
         | warkdarrior wrote:
         | Do you have a better solution to give models on-demand access
         | to data sources?
        
           | mkagenius wrote:
           | you mean other than writing an api? no
        
             | cruffle_duffle wrote:
             | And what is the protocol for the interface between the GPU-
             | based LLM and the API? How does the LLM signal to make a
             | tool call? What mechanism does it use?
             | 
             | Because MCP isn't an API it's the protocol that defines how
             | the LLM even calls the API in the first place. Without it,
             | all you've got is a chat interface.
             | 
             | A lot of people misunderstand what is the role of MCP. It's
             | the signaling the LLM uses to reach out of its context
             | window and do things.
        
         | turblety wrote:
         | Is there a reason they went and built some new standard, rather
         | than just using a http api?
        
           | knowaveragejoe wrote:
           | You can use either HTTP or stdio.
        
       | imbnwa wrote:
       | Feel like middle management is gonna go well before engineers do
       | with LLM rate of advancement
        
         | DebtDeflation wrote:
         | That started awhile ago. Google "the great flattening".
        
       | 6stringmerc wrote:
       | Feed Claude the data willingly to learn more about human behavior
       | they can't scrape or obtain otherwise without consent? Hard pass.
       | I'm not telling any AI any more about what it means to be a
       | creative person because training it how to suffer will only
       | further hurt my job prospects. Nice try, no dice.
        
       | n_ary wrote:
       | Is this the beginning of the apps for everything era and finally
       | the SaaS for your LLM begins? Initially we had internet but value
       | came when instead of installed apps, webapps arrived to become
       | SaaS. Now if LLMs can use specific remote MCP which is another
       | SaaS for your LLM, the remote MCP powered service can charge a
       | subscription to do wonderful things and voila! Let the new golden
       | age of SaaS for LLMs begin and the old fad(replace job XYZ with
       | AI) die already.
        
         | throwaway7783 wrote:
         | MCP is yet another interface for an existing SaaS (like UI and
         | APIs), but now magically "agent enabled". And $$$ of course
        
         | clvx wrote:
         | I'm more excited I can run now a custom site, hook an MCP for
         | it, and have all the cool intelligence I had to pay for _SaaS_
         | without having to integrate to them plus govern my data, it 's
         | a massive win. I just see AI assistant coding replicating
         | current SaaS services that I can run internally. If my shop was
         | a specific stack, I could aim to have all my supporting apps in
         | that specific stack using AI assistant coding, simplifying
         | operations, and being able to hook up MCP's to get intelligence
         | from all of them.
         | 
         | Truly, OSS should be more interesting in the next decade for
         | this alone.
        
           | heyheyhouhou wrote:
           | We should all thank the chinese companies for releasing so
           | many incredible open weight models. I hope they keep doing
           | it, I dont want to rely on OpenAI, Anthropic or Google for
           | all my future computer interactions.
        
             | achierius wrote:
             | Don't forget Meta, without them we probably wouldn't have
             | half the publicly available models we do today.
        
         | naravara wrote:
         | On one hand, yes this is very cool for a whole host of personal
         | uses. On the other hand giving any company this level of access
         | to as many different personal data sources as are out there
         | scares the shit out of me.
         | 
         | I'd feel a lot better if we had something resembling a
         | comprehensive data privacy law in the United States because I
         | don't want it to basically be the Wild West for anyone handling
         | whatever personal info doesn't get covered under HIPAA.
        
           | falcor84 wrote:
           | Absolutely agreed, but just wanted to mention that it's
           | essentially the same level of access you would give to
           | Zapier, which is one of their top examples of MCP
           | integrations.
        
           | n_ary wrote:
           | It took many years for online tracking, iframes, sticky
           | cookies and cambridge analytics before things like GDPR came
           | into existence. We have to similarly wait a few years before
           | similar major leaks happen through LLM
           | pipelines/integrations. Sadly, that is the reality we live
           | with.
        
             | jimbokun wrote:
             | The question is whether or not it happens before the
             | emergence of Skynet.
        
         | OtherShrezzing wrote:
         | I'd love a _tip jar_ MCP, where the LLM vendor can
         | automatically tip my website for using its
         | content/feature/service in a query's response. Even if the
         | amount is absolutely minuscule, in aggregate, this might make
         | up for ad revenue losses.
        
           | fredoliveira wrote:
           | Not that exactly, but I just saw this on twitter a few
           | minutes ago from Stripe:
           | https://x.com/jeff_weinstein/status/1918029261430255626
        
         | insin wrote:
         | It's perfect, nobody will have time to care about how many 9s
         | your service has because the nondeterministic failure mode now
         | sitting slap-bang in the middle is their problem!
        
           | Manfred wrote:
           | Imagine dynamic subscription rates based on vibes where you
           | won't even notice price hikes because not even the supplier
           | can explain what they are.
        
         | donmcronald wrote:
         | > Now if LLMs can use specific remote MCP which is another SaaS
         | for your LLM, the remote MCP powered service can charge a
         | subscription to do wonderful things and voila!
         | 
         | I've always worked under the assumption the best employees make
         | themselves replaceable via well defined processes and high
         | quality documentation. I have such a hard time understanding
         | why there's so much willingness to integrate irreplaceable SaaS
         | solutions into business processes.
         | 
         | I haven't used AI a ton, but everything I've done has focused
         | on owning my own context, config, etc.. How much are people
         | going to be willing to pay if someone else owns 10+ years of
         | their AI context?
         | 
         | Am I crazy or is owning the context massively valuable?
        
           | brumar wrote:
           | Hello fellow context owner. I like my modules with their
           | context.sh at their root level. If crafted with care, magic
           | happens. Reciprocally, when AI derails, it's most often due
           | to bad context management and fixed by improving it.
        
       | drivingmenuts wrote:
       | Is each Claude instance a separate individual or is a shared AI?
       | Because I'm not sure I would want an AI that learned about my
       | confidential business information sharing that with anyone else,
       | without my express permission.
       | 
       | This does not sound like it would be learning general information
       | helpful across an industry, but specific, actionable information.
       | 
       | If not available now, is that something that AI vendors are
       | working toward? If so, what is to keep them from using that
       | knowledge to benefit themselves or others of their choosing,
       | rather than the people they are learning from?
       | 
       | While people understand ethics, morals and legality (and ignore
       | them), that does not seem like something that an AI understands
       | in a way that might give them pause before doing an action.
        
       | zoogeny wrote:
       | I'm curious what kind of research people are doing that takes 45
       | minutes of LLM time. Is this a poke at the McKinsey consultant
       | domain?
       | 
       | Perhaps I am just frivolous with my own time, but I tend to use
       | LLMs in a more iterative way for research. I get partial answers,
       | probe for more information, direct the attention of the LLM away
       | from areas I am familiar and towards areas I am less familiar. I
       | feel if I just let it loose for 45 minutes it would spend too
       | much time on areas I do not find valuable.
       | 
       | This seems more like a play for "replacement" rather than
       | "augmentation". Although, I suppose if I had infinite wealth, I
       | could kick of 10+ research agents each taking 45 minutes and then
       | review their output as it became available, then kick off round
       | 2, etc. That is, I could do my process but instead of
       | interactively I could do it asynchronously.
        
         | throwup238 wrote:
         | That iterative research process is exactly how I use Google
         | Deep Research since it has a 20/day rate limit. Research a
         | problem, notice some off hand assumption or remark the report
         | made, and fire off another research run asking about it. It
         | depends on what you work on; in my use case I often have to do
         | hours of research for 30 minutes of work like when integrating
         | a bunch of different vendors' APIs or pouring over datasheets
         | for EE, so it's worth firing off research and then working on
         | something else for 10-20 minutes (it helps that the Gemini app
         | fires off a push notification when the report is done -
         | Anthropic please do this! Even for requests made from the web
         | app).
         | 
         | As for long research times, one thing I've been using it for is
         | historical research on old books. Gemini DeepResearch was the
         | first one able to properly explain the nuances of identifying a
         | chimeral first edition Origin of Species after taking half an
         | hour and reading 400 sources. It went into all the important
         | details like spelling errors and the properties of chimeral
         | FY2** copies found in various libraries around the world.
        
       | abhisek wrote:
       | Where is Skynet and when is judgement day?
        
         | 52-6F-62 wrote:
         | 1. Publishing advertisements all over the place. 2. Some
         | Tuesday
        
       | pton_xd wrote:
       | "To start, you can choose from Integrations for 10 popular
       | services, including Atlassian's Jira and Confluence, Zapier,
       | Cloudflare, Intercom, Asana, Square, Sentry, PayPal, Linear, and
       | Plaid. ... Each integration drastically expands what Claude can
       | do."
       | 
       | Give us an LLM with better reasoning capabilities, please! All
       | this other stuff just feels like a distraction.
        
         | Centigonal wrote:
         | Building integrations is a more predictable way of developing a
         | smaller competitive advantage versus research. I think most of
         | the leading AI companies are adopting a multi-arm strategy of
         | research + product/ecosystem development to balance their
         | risks.
        
         | atonse wrote:
         | I disagree. They can walk and chew gum, do both things at once.
         | And this practical stuff is very important.
         | 
         | I've been using the Atlassian MCP for nearly a month now, and
         | it's completely changed (and eliminated) the feeling of having
         | an overwhelming backlog.
         | 
         | I can have it do things like "find all the tickets related to
         | profile editing and combine them into one epic" where it works
         | perfectly. Or "help me prioritize the 15 tickets assigned to me
         | this sprint" and it'll actually go through and suggest "maybe
         | you can do these two tickets first since they seem smaller,
         | then do this big one" - i haven't hooked it up to my calendar
         | yet.
         | 
         | But I'd love for it to suggest things like "do this one ticket
         | that requires a lot of heads down time on wednesday since you
         | don't have any meetings. I can create a block on your calendar
         | so that nobody will schedule a meeting then"
         | 
         | Those are all superhuman things that can be done with MCP and a
         | smart model.
         | 
         | I've defined rules in cursor that say "when I ask you to mark
         | something ready for test, change the status and assign it to <x
         | person>, and leave a comment summarizing the changes"
         | 
         | If you look at my JIRA comments now, you'd wonder how I had so
         | much time to write such thorough comments. I don't, Cursor and
         | whatever model is doing it for me.
         | 
         | It's been an absolute game changer. MCP is going to be what the
         | App store was to mobile. Yes you can get by without it, but
         | actually hooking into all your daily tool is when this stuff
         | gets insanely valuable in a practical sense.
        
           | OJFord wrote:
           | > If you look at my JIRA comments now, you'd wonder how I had
           | so much time to write such thorough comments. I don't, Cursor
           | and whatever model is doing it for me.
           | 
           | How do your colleagues feel about it?
        
             | warkdarrior wrote:
             | My colleagues' LLM assistants think that my LLM assistant
             | leaves great JIRA comments.
        
               | atonse wrote:
               | haha! Funny enough I do have to tell the LLMs to leave
               | concise comments.
               | 
               | I also don't want to read too many unnecessary words.
        
               | sdesol wrote:
               | Joking aside, I do believe we are moving into a era where
               | we have LLMs write for each other and humans have a
               | dedicated TL;DR. This includes code with a lot of
               | comments or design styles that might seem obvious or
               | stupid but can help another LLM.
        
               | eknkc wrote:
               | Why use JIRA at this point then?
               | 
               | Can't we point an LLM to a sqlite db and tell it to treat
               | it as an issue tracking db and have everyone do the same.
               | 
               | The service (jira) would materialize inside the LLMs
               | then.
               | 
               | Why even use abstractions like tickets etc. Ask LLM what
               | to do.
        
               | zoogeny wrote:
               | JIRA is more than just ticket management for most big
               | orgs. It provides a reporting interface for business with
               | long-term planning capabilities. A lot of the annoying
               | things that devs have to do in JIRA is often there to
               | make those functions more valuable. In other cases it is
               | a compliance thing as well. Some certifications necessary
               | for enterprise sales require audit trails for all code
               | changes, from the bug report to the code commit. JIRA
               | provides the integration and reporting necessary for
               | that.
               | 
               | Unless you can provide the same visibility, long-term
               | planning features and compliance aspects of JIRA on top
               | of you sqlite db, you won't compete with JIRA. But if you
               | do add those things on top of SQLite and LLMs, you
               | probably have a solid business idea. But you'd first need
               | to understand JIRA well enough to know why they are there
               | in the first place.
        
               | falcor84 wrote:
               | Exactly, applying the principle of Chesterton's Fence
               | [0].
               | 
               | [0] https://en.wikipedia.org/w/index.php?title=Wikipedia:
               | FENCE
        
             | atonse wrote:
             | Well I had half a mind to not tell them to see what they'd
             | say, but I also was excited to show everyone so they can
             | also be empowered with it.
             | 
             | One of them said "yeah I was wondering cuz you never write
             | that much" - as a leader, I actually don't set a good
             | example of how to leave quality JIRA comments. And my view
             | with all these things is that I have to lead by example,
             | not by orders.
             | 
             | With the help of these kinds of tools, we can improve the
             | quality of these comments. And I wouldn't expect others to
             | write them manually, more that I wanted to show that
             | everyone's use of JIRA on the team can improve.
        
               | OJFord wrote:
               | Notice they commented on the quantity, not the quality?
               | 
               | I don't think it's good leadership to unleash drivel on
               | an organisation, have people waste time reading and
               | perhaps replying to it, thinking it's something important
               | and thoughtful coming from atonse.
               | 
               | Good thing you told them though, now they can ignore it.
        
               | stefan_ wrote:
               | It sure seems like the next evolution of Jira though.
               | Designed to waste everyones time, picked by "leaders"
               | that don't use it. Why not spam tickets with LLM drivel?
               | They are perfect to pick up on all the inconsistency in
               | the PM insanity driven custom designed workflow - and
               | comment on it tagging a bunch of stray people seen in the
               | ticket history, the universal exit hatch.
        
               | atonse wrote:
               | In another comment I mentioned that I ask for it to be
               | concise.
               | 
               | Also, a lot of the kinds of comments are things like,
               | when you combine a bunch of tickets, leaving comments on
               | the cancelled tickets to show why they were cancelled.
               | 
               | In the past, that info simply wouldn't be there.
        
               | sensanaty wrote:
               | Someone please shoot me if my PM ever gets this idea in
               | his head of using LLM slop to spam tickets with en masse.
               | 
               | There's nothing I hate more than people sending me their
               | AI messages, be it in a ticket or a PR or even on Slack.
               | I'm forced to engage and spend effort on something it
               | took them all of 3 seconds to generate without even
               | proofreading what they're sending me says. The amount of
               | times I've had to ask 11 clarifying questions because
               | their message has 11 contradictions within itself is
               | maddening to the highest degree.
               | 
               | The worst is when I call out one of these numerous
               | contradictions, and the reply is "oh haha, stupid Claude
               | :)", makes my blood boil and at the same time amazes me
               | that someone has so little pride and respect for their
               | fellow humans to do crap like that.
        
               | artur_makly wrote:
               | "I remember those days when we manually wrote
               | comments"... - what were comments papa?
        
               | atonse wrote:
               | Sounds like your coworkers might be abusing things here.
               | 
               | I'm not remotely interested in throwing random slop in
               | there.
               | 
               | In fact, we did try a year ago to have AI help write our
               | tickets and it was very clear that they were AI
               | generated. There was way too much nonsense in there that
               | wasn't relevant to our product.
               | 
               | So we don't do that.
        
           | zoogeny wrote:
           | Honestly, that backlog management idea is probably the first
           | time an MCP actually sounded appealing to me.
           | 
           | I'm not in that world at the moment, but I've been the lead
           | on several projects where the backlog has became a dumping
           | ground of years of neglect. You end up with this tiered
           | backlog thing where one level of backlog gets too big so you
           | create a second tier of backlog for the stuff you are
           | actually going to work on. Pretty soon you end up with
           | duplicates in the second tier backlog for items already in
           | the base level backlog since no one even looks at that old
           | backlog anymore.
           | 
           | I've done a lot of tidy up myself when I inherit this kind of
           | mess, just closing tickets we definitely will never get to,
           | de-duping, adding context when available, grouping into
           | epics, tagging with relevant "tech-debt", "security", "bug",
           | "automation", etc. But when there are 100s of tickets it is a
           | slog. Having an LLM do this makes so much sense.
        
           | organsnyder wrote:
           | I have Claude hooked up to our project management system,
           | GitHub, and my calendar (among other things). It's already
           | proving extremely useful for various project management
           | tasks.
        
       | edaemon wrote:
       | Lots of reported security issues with MCP servers seemed to be
       | mitigated by their local-only setup. These MCP implementations
       | are remotely accessible, do they address security differently?
        
         | paulgb wrote:
         | Largely, yes -- one of the big issues with using other people's
         | random MCP servers is that they are run by default as a system
         | process, even if they only need to speak over an API. Remote
         | MCP mitigates this by not running any untrusted code locally.
         | 
         | What it _doesn't_ seem to yet mitigate is prompt injection
         | attacks, where a tool call description of one tool convinces
         | the model to do something it shouldn't (like send sensitive
         | data to a server owned by the attacker.) I think these concerns
         | are a little bit overblown though; things like pypi and the
         | Chrome Extension store scare me more and it doesn't stop them
         | from mostly working.
        
         | zoogeny wrote:
         | They offhand mention OAuth integration in their discussion of
         | Cloudflare integrated solutions. I can't see how that would be
         | any less secure than any other OAuth protected API offering.
        
       | Nijikokun wrote:
       | context windows are too small and conversely larger windows are
       | not accurate enough its annoying
        
       | indigodaddy wrote:
       | So any chat to Claude will now just auto-activate web search to
       | be included? What if I try to use it just as a search engine
       | exclusively? Also will proxies like Openrouter have access to the
       | web search capabilities?
        
       | gianpaj wrote:
       | > Web search is now globally available to all Claude.ai paid
       | plans.
        
         | surfingdino wrote:
         | I don't know why web search is such a big deal. You can
         | implement it with any LLM that offers an API and function
         | calling.
        
           | tene80i wrote:
           | Do you think most people know how to do that, or even what it
           | means? The market is larger than just software engineers.
        
       | ChicagoDave wrote:
       | There is targeted value in integrations, but everything still
       | leads back to larger context windows.
       | 
       | I love MCP (it's way better than plain Claude) but even that runs
       | into context walls.
        
       | davee5 wrote:
       | I'm quite struck by the title of this announcement. The box being
       | drawn around "your world" shows how narrow the AI builder's
       | window into reality tends to be.
       | 
       | > a new way to connect your apps and tools to Claude. We're also
       | expanding... with an advanced mode that searches the web.
       | 
       | The notion of software eating the world, and AI accelerating that
       | trend, always seems to forget that The World is a vast thing, a
       | physical thing, a thing that by its very nature can never be
       | fully consumed by the relentless expansion of our digital
       | experiences. Your worldview /= the world.
       | 
       | The cynic would suggest that the teams that build these tools
       | should go touch grass, but I think that misses the mark. The real
       | indictment is of the sort of thinking that improvements to
       | digital tools [intelligences?] in and of themselves can
       | constitute truly substantial and far reaching changes.
       | 
       | The reach of any digital substrate inherently limited, and this
       | post unintentionally lays that bare. And while I hear
       | accelerationists invoking "robots" as the means for digital
       | agents to expand their potent impact deeper into the real world I
       | suggest this is the retort of those who spend all day in apps,
       | tools, and the web. The impacts and potential of AI is indeed
       | enormous, but some perspective remains warranted and occasional
       | injections of humility and context would probably do these teams
       | some good.
        
         | dang wrote:
         | (Just for context: we've since changed the title above.
         | Corporate press release titles are rarely a good fit for HN and
         | we usually change them.
         | 
         | https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...
         | )
        
       | atonse wrote:
       | I think with MCPs and related tech, if Apple just internally went
       | back to the drawing board and integrated the concept of MCPs
       | directly into iOS (via the "Apple Intelligence" umbrella) and
       | seamlessly integrated it into the App Store and apps, they will
       | win the mobile race for this.
       | 
       | Being Apple, they would have to come up with something novel like
       | they did with push (where you have _one_ OS process running that
       | delegates to apps rather than every app trying to handle push
       | themselves) rather than having 20 MCP servers running. But I
       | think if they did this properly, it would be so amazing.
       | 
       | I hope Apple is really re-thinking their absolutely comical start
       | with AI. I hope they regroup and hit it out of the park (like how
       | Google initially stumbled with Bard, but are now hitting it out
       | of the park with Gemini)
        
         | mattlondon wrote:
         | Do you really think Apple can catch up with and then surpass
         | all these SOTA AI labs?
         | 
         | They bet big and got distracted on VR. It was obviously the
         | wrong choice at the time, and even more so now. They're going
         | to have to abandon all that VR crap and pivot hard to AI to try
         | and catch up. I think the more likely case is they _can 't_
         | catch up now and will just have to end up licensing Gemini from
         | Google/Google paying them to use Gemini as the default AI.
        
           | atonse wrote:
           | No I'm not saying Apple even has to build their own model.
           | I'm saying Apple can build a stellar _product_ experience
           | around it.
           | 
           | As others have pointed out, if that's what App Intents are,
           | have they started to integrate this as part of Apple
           | Intelligence?
        
         | _pdp_ wrote:
         | Apple already has the equivalent of MCP.
         | https://developer.apple.com/documentation/appintents.
        
         | bloomca wrote:
         | That's just App Intents. I don't think they lack data at this
         | point, they just struggle how to use that data on the OS level
        
       | cruffle_duffle wrote:
       | The video demos never really showed the auth "story" but I assume
       | that there is some oauth step to connect Claude with your MCP
       | service, right?
        
       | belter wrote:
       | All these integrations are likely to cause a massive security
       | leak sooner or later.
        
       | OJFord wrote:
       | Where's the permissioning, the data protection?
       | 
       | People will say 'aaah ad company' (me too sometimes) but I'd
       | honestly trust a Google AI tool with this way more. Not just
       | because it already has access to my Google Workspace obviously,
       | but just because it's a huge established tech firm with decades
       | of experience in trying not to lose (or have taken) user data.
       | 
       | Even if they get the permissions right and it can only read my
       | stuff if I'm just asking it to 'research', now Anthropic has all
       | that and a target on their backs. And I don't even know what 'all
       | that' is, whatever it explored deeming it maybe useful.
       | 
       | Maybe I'm just transitioning into old guy not savvy with latest
       | tech, but I just can't trust any of this 'go off and do whatever
       | seems correct or helpful with access to my filesystem/Google
       | account/codebase/terminal' stuff.
       | 
       | I like chat-only (well, +web) interactions where I control the
       | input and taking the output, but even that is not an experience
       | that gives me any confidence in giving uncontrolled access to
       | stuff and it always doing something correct and reasonable. It's
       | often confidently incorrect too! I wouldn't give an intern free
       | reign in my shell either!
        
         | joshwarwick15 wrote:
         | Permissoning: OAuth Data protection: Local LLMs
        
       | weinzierl wrote:
       | If you do not enable "Web Search" are you guaranteed it does not
       | access the web anyway?
       | 
       | Sometimes I want a pure model answer and I used to use Claude for
       | that. For research tasks I preferred ChatGPT, but I found that
       | you cannot reliably deny it web access. If you are asking it a
       | research question, I am pretty sure it uses web search, even when
       | _" Search"_ and _" Deep Research"_ are off.
        
       | rafram wrote:
       | Oh no, remote MCP servers. Security was nice while it lasted!
        
         | rvz wrote:
         | This is a fantastic time to get into the security space and
         | trick all these LLMs into leaking sensitive data and make a lot
         | of money out of that.
         | 
         | MCP is a flawed spec and quite frankly a scam.
        
         | knowaveragejoe wrote:
         | What makes a remotely hosted MCP server less secure? The
         | alternative, and what most of MCP consists of at the moment, is
         | essentially running arbitrary code on your machine, as your
         | user, and hooking this up to an LLM.
        
       | rvz wrote:
       | Can't wait for the first security incident relating to the
       | fundamentally flawed MCP specification which an LLM will
       | inadvertently be tricked to leak sensitive data.
       | 
       | Increasing the amount of "connections" to the LLM increases the
       | risk in a leak and it gives your more rope to hang yourself with
       | when at least one connection becomes problematic.
       | 
       | Now is a _great_ time to be a LLM security consultant.
        
       | dimgl wrote:
       | This is great, but can you fix Claude 3.7 and make it more like
       | 3.5? I'm seriously disappointed with 3.7. It seems to be
       | performing significantly worse for me on all tasks.
       | 
       | Even my wife, who normally used Claude to create interesting
       | recipes to bake cookies, has noticed a huge downgrade in 3.7.
        
         | t0lo wrote:
         | 3.7 seems to be way more filler and ambiguity and less insights
         | for me.
        
       | bjornsing wrote:
       | The strategic business dynamic here is very interesting. We used
       | to have "GPT-wrapper SaaS". I guess what we're about to see now
       | is the opposite: "SaaS/MCP-wrapper GPTs".
        
         | jimbokun wrote:
         | The GPT wrappers were always going to be subsumed by
         | improvements to the models themselves.
         | 
         | LLMs wrapping the services makes more sense, as the data stored
         | in those services adds a lot of value to off the shelf LLMs.
        
           | bjornsing wrote:
           | I think I agree. There's a lot of utility in a single LLM
           | that can talk to many SaaS and integrate them. Feels like a
           | better path forward than a separate LLM inside every SaaS.
        
       | hdjjhhvvhga wrote:
       | The people who connect a LLM to their Paypal and CLoudflare
       | accounts perfectly deserve the consequences, both positive and
       | negative.
        
       | conroy wrote:
       | Remote MCP servers are still in a strange space. Anthropic
       | updated the MCP spec about a month ago with a new Streamable HTTP
       | transport, but it doesn't appear that Claude supports that
       | transport yet.
       | 
       | When I hooked up our remote MCP server, Claude sends a GET
       | request to the endpoint. According to the spec, clients that want
       | to support both transports should first attempt to POST an
       | InitializeRequest to the server URL. If that returns a 4xx, it
       | should then assume the SSE integration.
        
       | gonzan wrote:
       | So there are going to be companies built on just an MCP server I
       | guess, wonder what the first big one will be, just a matter of
       | time I think
        
       | worldsayshi wrote:
       | Is it just me that would like to see more of confirmations before
       | making opaque changes to remote systems?
       | 
       | I might not dare to add an integration if it can potentially add
       | a bunch of stuff to the backing systems without my approval.
       | Confirmations and review should be part of the protocol.
        
         | sepositus wrote:
         | Yeah, this was my first thought. I was watching the video of it
         | creating all of these Jira tickets just thinking in my head: "I
         | hope it just did all that correctly." I think the level of
         | patience with my team would be very low if I started running an
         | LLM that accidentally deleted a bunch of really important
         | tickets.
        
           | worldsayshi wrote:
           | Yeah. Feels like it's breaking some fundamental UX principle.
           | If an action is going to make any significant change make
           | sure that it fulfills _at least_ one of these:
           | 
           | 1. Can be rollbacked/undone
           | 
           | 2. Clearly states exactly what it's going to do in a
           | reviewable way
           | 
           | If those aren't fulfilled you're going to end up with users
           | that are afraid of using your app.
        
       | todsacerdoti wrote:
       | Check out 2500+ MCP servers at https://mcp.pipedream.com
        
       | the_clarence wrote:
       | Been playing with MCP in the last few days and it's basically a
       | more streamlined way to define tools/function calls.
       | 
       | That + the agent SDK of openAI makes creating agentic flow so
       | easy.
       | 
       | On the other hand you're kinda forced to run these tools / MCP
       | servers in their own process which makes no sense to me.
        
         | nilslice wrote:
         | you might like mcp.run, a tool management platform we're
         | working on... totally agree running a process per tool, with
         | all kinds of permissions is nonsensical - and the move to
         | "remote MCP" is a good one!
         | 
         | but, we're taking it a step (or two) further, enabling you to
         | dynamically build up a MCP server from other servers managed in
         | your account with us.
         | 
         | try it out, or let me get you a demo! this goes for any casual
         | comment readers too ;)
         | 
         | https://cal.com/team/dylibso/mcp.run-demo
        
           | the_clarence wrote:
           | I meant I wanted to run them synchronously in the same
           | process :)
        
             | nilslice wrote:
             | that's what this does :)
             | 
             | you bundle mcp servers into a profile, which acts as a
             | single virtual mcp server and can be dynamically updated
             | without re-configuring your mcp client (e.g. claude)
        
       | kostas_f wrote:
       | Anthropic's strategy seems to go towards "AI as universal glue".
       | They want to tie Claude into all the tools teams already live in
       | (Jira, Confluence, Zapier, etc.). That's a smart move for
       | enterprise adoption, but it also feels like they're compensating
       | for a plateau in core model capabilities.
       | 
       | Both OpenAI and Google continue to push the frontier on
       | reasoning, multimodality, and efficiency whereas Claude's recent
       | releases have felt more iterative. I'd love to see Anthropic push
       | into model research again.
        
         | bl4ckneon wrote:
         | I am sure they are already doing that. To think that an AI
         | researcher is doing essentially api integration work is a bit
         | silly. Multiple efforts can happen at the same time
        
           | kostas_f wrote:
           | They certainly have internal research efforts underway, but
           | I'm talking about what's actually been released to end users
           | via the Claude app or API. Their latest public Sonnet release
           | 3.7 (feb 2025) felt pretty incremental compared to Sonnet 3.5
           | (june 2024), especially when you compare them to OpenAI and
           | Google released models. In terms of the models you can
           | integrate today, Anthropic hasn't quite kept pace on either
           | reasoning performance or cost efficiency.
        
         | freewizard wrote:
         | I would expect Slack do this. Maybe Slack and Claude should
         | merge one day, given MS and Google has their own core models.
        
           | tjsk wrote:
           | Slack is owned by Salesforce which is doing its own
           | Agentforce stuff
        
             | spacebanana7 wrote:
             | Salesforce loves acquisitions. I can already picture
             | Benioff's victory speech on CNBC.
        
           | kostas_f wrote:
           | Anthropic is now too expensive to be acquired. Only Amazon
           | could be a potential buyer, given that out of the 3 big cloud
           | providers, it's the only one without their own model
           | offering.
        
       | deanc wrote:
       | I find it absolutely astonishing that Atlassian hasn't yet
       | provided an LLM for confluence instances and instead a third
       | party is required. The sheer scale of documentation and
       | information I've seen at some organisations I've worked with is
       | overwhelming. This would be a killer feature. I do not recommend
       | confluence to my clients simply because the search is so
       | appalling .
       | 
       | Keyword search is such a naive approach to information discovery
       | and information sharing - and renders confluence in big orgs
       | useless. Being able to discuss and ask questions is a more
       | natural way of unpacking problems.
        
       | artur_makly wrote:
       | on their announcement page they wrote " In addition to these
       | updates, we're making WEB SEARCH available globally for all
       | Claude users on paid plans."
       | 
       | So I tested a basic prompt:
       | 
       | 1. go to : SOME URL
       | 
       | 2. copy all the content found VERBATIM, and show me all that
       | content as markdown here.
       | 
       | Result : it FAILED miserably with a few basic html pages - it
       | simply is not loading all the page content in its internal
       | browser.
       | 
       | What worked well: - Gemini 2.5Pro (Experimental) - GPT 4o-mini //
       | - Gemini 2.0 Flash ( not verbatim but summarized )
        
       | meander_water wrote:
       | Looks like this is possible due to the relatively recent addition
       | of OAuth2.1 to the MCP spec [0] to allow secure comms to remote
       | servers.
       | 
       | However, there's a major concern that server hosters are on the
       | hook to implement authorization. Ongoing discussion here [1].
       | 
       | [0] https://modelcontextprotocol.io/specification/2025-03-26
       | 
       | [1]
       | https://github.com/modelcontextprotocol/modelcontextprotocol...
        
         | dmarble wrote:
         | Direct link to the spec page on authorization:
         | https://modelcontextprotocol.io/specification/2025-03-26/bas...
         | 
         | Source:
         | https://github.com/modelcontextprotocol/modelcontextprotocol...
        
         | marifjeren wrote:
         | That github issue is closed but:
         | 
         | > major concern that server hosters are on the hook to
         | implement authorization
         | 
         | Doesn't it make perfect sense for server hosters to implement
         | that? If Claude wants access to my Jira instance on my behalf,
         | and Jira hosts a remote MCP server that aids in exposing the
         | resources I own, isn't it obvious Jira should be responsible
         | for authorization?
         | 
         | How else would they do it?
        
           | cruffle_duffle wrote:
           | The authorization server and resource server can be separate
           | entities. Meaning that jira instance can validate the token
           | but not be the one issuing it or handling credentials.
        
             | marifjeren wrote:
             | Yes, this is true of OAuth, which is exactly what the
             | latest Model context protocol is using.. What's the concern
             | again?
             | 
             | I guess maybe you are saying the onus is NOT on the MCP
             | server but on the authorization server.
             | 
             | Anyway while technically true this is mostly just
             | distracting because:
             | 
             | 1. in my experience the resource server and the
             | authorization server are almost always maintained by the
             | same company -- Jira/Atlassian being an example
             | 
             | 2. the resource server still minimally has the
             | responsibility of identifying and integrating with some
             | authorization server, and *someone* has to be the
             | authorization server, so I'm not sure deferring the
             | responsibility to that unidentified party is a strong
             | defense against the critique anyway. The strong defense is:
             | of course the MCP server should have these
             | responsibilities.
        
               | meander_water wrote:
               | I think the pain points will be mostly for enterprise
               | customers who want to integrate servers into their auth
               | systems.
               | 
               | For example, say you have a JIRA self hosted instance
               | with SSO to entra id. You can't just install an MCP
               | server off the shelf because authZ and resources are
               | tightly coupled and implementation specific. It would be
               | much easier if the server only handled providing
               | resources, and authZ was offloaded to a provider of your
               | choosing.
        
               | marifjeren wrote:
               | I'm under the impression that what you described is
               | exactly how the new model context protocol works, since
               | it's using oauth and is therefore unaware of any of the
               | authentication (eg SSO) details. Your authentication
               | process could be done via carrier pigeon and Claude would
               | be none the wiser.
        
           | halter73 wrote:
           | That github issue is closed because it's been mostly
           | completed. As of https://github.com/modelcontextprotocol/mode
           | lcontextprotocol..., the latest draft specification does not
           | require the resource server to act as or poxy to the IdP. It
           | just hasn't made its way to a ratified spec yet, but SDKs are
           | already implementing the draft.
        
       | bdd_pomerium wrote:
       | This is very cool. Integrations look slick. Folks are
       | understandably hyped--the potential for agents doing "deep
       | research-style" work across broad data sources is real.
       | 
       | But the thread's security concerns--permissions, data protection,
       | trust--are dead on. There is also a major authN/Z gap, especially
       | for orgs that want MCP to access internal tools, not just curated
       | SaaS.
       | 
       | Pushing complex auth logic (OAuth scopes, policy rules) into
       | every MCP tool feels backwards.
       | 
       | * Access-control sprawl. Each tool reinvents security. Audits get
       | messy fast.
       | 
       | * Static scopes vs. agent drift. Agents chain calls in ways no
       | upfront scope list can predict. We need per-call, context checks.
       | 
       | * Zero-Trust principles mismatch. Central policy enforcement is
       | the point. Fragmenting it kills visibility and consistency.
       | 
       | We already see the cost of fragmented auth: supply-chain hits and
       | credential reuse blowing up multiple tenants. Agents only raise
       | the stakes.
       | 
       | I think a better path (and in one in full disclosure, we're
       | actively working on at Pomerium ) is to have:
       | 
       | * One single access point in front of all MCP resources.
       | 
       | * Single sign-on once, then short-lived signed claims flow
       | downstream..
       | 
       | * AuthN separated from AuthZ with a centralized policy engine
       | that evaluates every request, deny-by-default. Evaluation in both
       | directions with hooks for DLP.
       | 
       | * Unified management, telemetry, audit log and policy surface.
       | 
       | I'm really excited about what MCP is putting us in the direction
       | of being able to do with agents.
       | 
       | But without a higher level way to secure and manage the access,
       | I'm afraid we'll spend years patching holes tool by tool.
        
       | tkgally wrote:
       | For the past couple of months, I've been running occasional side-
       | by-side tests of the deep research products from OpenAI, Google,
       | Perplexity, DeepSeek, and others. Ever since Google upgraded its
       | deep research model to Gemini 2.5 Pro Experimental, it has been
       | the best for the tasks I give them, followed closely by OpenAI.
       | The others were far behind.
       | 
       | I ran two of the same prompts just now through Anthropic's new
       | Advanced Research. The results for it and for ChatGPT and Gemini
       | appear below. Opinions might vary, but for my purposes Gemini is
       | still the best. Claude's responses were too short and simple and
       | they didn't follow the prompt as closely as I would have liked.
       | 
       | Writing conventions in Japanese and English
       | 
       | https://claude.ai/public/artifacts/c883a9a5-7069-419b-808d-0...
       | 
       | https://docs.google.com/document/d/1V8Ae7xCkPNykhbfZuJnPtCMH...
       | 
       | https://chatgpt.com/share/680da37d-17e4-8011-b331-6d4f3f5ca7...
       | 
       | Overview of an industry in Japan
       | 
       | https://claude.ai/public/artifacts/ba88d1cb-57a0-4444-8668-e...
       | 
       | https://docs.google.com/document/d/1j1O-8bFP_M-vqJpCzDeBLJa3...
       | 
       | https://chatgpt.com/share/680da9b4-8b38-8011-8fb4-3d0a4ddcf7...
       | 
       | The second task, by the way, is just a hypothetical case. Though
       | I have worked as a translator in Japan for many years, I am not
       | the person described in the prompt.
        
       | noisy_boy wrote:
       | What is the best stack/platform to get started with MCP? I'm
       | talking in terms of ergonomics, features and popularity.
        
       | jngiam1 wrote:
       | This is awesome. We implemented a MCP client that's fully
       | compatible with the new remote MCP specs, support OAuth and all.
       | It's really smooth and I think paves the way for AI to work with
       | tools. https://lutra.ai/mcp
        
       | jes5199 wrote:
       | the MCP spec as it stands today is pretty half-baked. It's pretty
       | clear that the first edition was trying to emulate STDIO over
       | HTTP, but that meant holding open a connection indefinitely. The
       | new revision tries to solve this by letting you hold open as many
       | connections as you want! but that makes it vague about message
       | delivery ordering when you have multiple streams open. There even
       | seems to be part of the spec that is logically impossible -
       | people are wrestling with it in the GitHub issues.
       | 
       | which is to say: I'm not sure it actually wins, technically, over
       | the OpenAI/OpenAPI idea from last year, which was at least easy
       | to understand
        
       | sagarpatil wrote:
       | Should have just called it Remote MCP. Integrations sounds very
       | vague.
        
       | Surac wrote:
       | I often use Claude 3.7 on programming things never done before.
       | Even extensive search in the web brings up zero hits. I
       | understand that this is very uncommon but my work portfolio is
       | more science than real programming. Claude 3.7 really ,,thinks"
       | about the questions i ask. But 3.5 regularly drifts into dream
       | mode if asked anything over it's training data. So if you ask for
       | code easy found on the web you will see no difference. Try asking
       | things not so common and you will see a difference
        
       | myflash13 wrote:
       | Finally I can do something simple that I've wanted to do for
       | ages: paste in a poster image or description of an event and tell
       | the AI to add it to my calendar.
        
         | gjohnhazel wrote:
         | I just have it create an .ics file and open that
        
       | franze wrote:
       | > Integrations and advanced Research are now available in beta on
       | the Max, Team, and Enterprise plans, and will soon be available
       | on Pro.
        
       | MarkMarine wrote:
       | The plaid integration is to let you look at your install? I was
       | excited to see all my accounts (as a consumer) knit together and
       | reported on by Claude. Bummer
        
       | sebstefan wrote:
       | An AI that is capable of responding to a "How do I do X" prompt
       | with "Hey this seems related to a ticket that was already opened
       | on your Jira 2 months ago", or "There is a document about this in
       | Sharepoint", it would bring me such immense value, I think I
       | might cry.
       | 
       | Edit: Actually right in the tickets themselves would probably be
       | better and not require MCP... but still
        
         | MagicMoonlight wrote:
         | Copilot can already be setup to use sharepoint etc. And you can
         | set it up to only respond based on internal content.
         | 
         | So if you ask it "who is in charge of marketing" it will read
         | it off sharepoint instead of answering generically
        
       | elia_42 wrote:
       | Very interesting. The integration videos are great to start right
       | away and try out the new features. The extensions of the deep
       | reasoning capabilities are also incredible.
       | 
       | I think we are coming to a new automated technology ecosystem
       | where LLMs will orchestrate many different parts of software with
       | each other, speeding up the launch, evolution and monitoring of
       | products.
        
       | abhisek wrote:
       | Looks to me another apps ecosystem coming up similar to Android
       | or iPhone. We are probably going to see a lot of AI apps
       | marketplaces that solve the problem of discovery, billing &
       | integration with AI hosts like Claude Desktop.
        
       | game_the0ry wrote:
       | Its only a matter of time where folks write user stories and an
       | LLM takes over for the first draft, then iterate from there.
       | 
       | Btw, that speaks to how important it is to get clear business
       | requirements for work.
        
         | clintonb wrote:
         | Greptile (https://www.greptile.com/) tries to do that, at least
         | for bug tickets. I recall being annoyed by its suggestions
         | (posted as Linear comments).
        
           | dakshgupta wrote:
           | Co-founder of Greptile - that was a bad feature that we since
           | deprecated to focus entirely on AI code reviews
        
       | ausbah wrote:
       | usability like this seems to be a big nail through oss llm usage
        
       ___________________________________________________________________
       (page generated 2025-05-02 23:02 UTC)