[HN Gopher] Claude Integrations
___________________________________________________________________
Claude Integrations
Author : bryanh
Score : 690 points
Date : 2025-05-01 16:02 UTC (1 days ago)
(HTM) web link (www.anthropic.com)
(TXT) w3m dump (www.anthropic.com)
| behnamoh wrote:
| That "Allow for this chat" pop up should be optional. It ruins
| the entire MCP experience. Maybe make it automatic for non-
| mutating MCP tools.
| pcwelder wrote:
| In the latest update they've replaced "Allow for this chat"
| with "Always Allow".
| avandekleut wrote:
| MCP also has support for "hints" which note whether an action
| is destructive.
| arjie wrote:
| The cookie banner type constant Allow Allow Allow makes their
| client unusable. Are there any alternative desktop MCP clients?
| rahimnathwani wrote:
| https://github.com/patruff/ollama-mcp-bridge
| jarbus wrote:
| Anyone have any data on how effective models are at leveraging
| MCP? Hard to tell if these things are a buggy mess or a game
| changer
| striking wrote:
| Claude Code is doing pretty well in my experience :) I've built
| a tool in our CI environment that reads Jira tickets, files
| GitHub PRs, etc. automatically. Great for one-shotting bugs,
| and it's only getting better.
| xnx wrote:
| Integrations are nice, but the superpower is having an AI smart
| enough to operate a computer/keyboard/mouse so it can do anything
| without the cooperation/consent of the service being used.
|
| Lots of people are making moves in this space (including
| Anthropic), but nothing has broken through to the mainstream.
| WillAdams wrote:
| Or even access multiple files?
|
| Why can't one set up a prompt, test it against a file, then
| once it is working, apply it to each file in a folder in a
| batch process which then provides the output as a single
| collective file?
| xnx wrote:
| You can probably achieve what you want with
| https://github.com/simonw/llm and a little bit of command
| line.
|
| Not sure what OS you're on, but in Windows it might look like
| this:
|
| FOR %%F IN (*.txt) DO (TYPE "%%F" | llm -s "execute this
| prompt" >> "output.txt)
| WillAdams wrote:
| I want to work with PDFs (or JPEGs), but that should be a
| start, I hope.
| xnx wrote:
| llm supports attachments too
|
| FOR %%F IN (*.pdf) DO (llm -a %%F -s "execute this
| prompt" >> output.txt)
| TheOtherHobbes wrote:
| I've just done something similar with Claude Desktop and its
| built-in MCP servers.
|
| The limits are still buggy responses - Claude often gets
| stuck in a useless loop if you overfeed it with files - and
| lack of consistency. Sometimes hand-holding is needed to get
| the result you want. And it's slow.
|
| But when it works it's amazing. If the issues and limitations
| were solved, this would be a complete game changer.
|
| We're starting to get somewhat self-generating automation and
| complex agenting, with access to all of the world's public
| APIs and search resources, controlled by natural language.
|
| I can't see the edges of what could be possible with this.
| It's limited and clunky for now, but the potential is
| astonishing - at least as radical an invention as the web
| was.
| WillAdams wrote:
| I would be fine with storing the output from one run,
| spooling up a new one, then concatenating after multiple
| successive runs.
| pglevy wrote:
| I've been using Claude Desktop with built-in File MCP to run
| operations on local files. Sometimes it will do things
| directly but usually it will write a Python script. For
| example: combine multiple.md files into one or organize
| photos into folders.
|
| I also use this method for doing code prototyping by giving
| it the path to files in the local working copy of my repo.
| Really cool to see it make changes in a vite project and it
| just hot reloads. Then I make tweaks or commit changes as
| usual.
| arnaudsm wrote:
| I get often ratelimited or blocked from websites because I
| browse them too fast with my keyboard and mouse. The AI would
| be slowed down significantly.
|
| LLM-desktop interfaces make great demos, but they are too slow
| to be usable in practice.
| xnx wrote:
| Good point. Probably makes sense to think of it as an
| assistant you assign a job to and get results back later.
| boh wrote:
| I think all the retail LLM's are working to broaden the available
| context, but in most practical use-cases it's having the ability
| to minimize and filter the context that would produce the most
| value. Even a single PDF with too many similar datapoints leads
| to confusion in output. They need to switch gears from the high
| growth, "every thing is possible and available" narrative, to one
| that narrows the scope. The "hallucination" gap is widening with
| more context, not shrinking.
| mikepurvis wrote:
| That's a tough pill to swallow when your company valuation is a
| $62B based on the premise that you're building a bot capable of
| transcendent thought, ready to disrupt every vertical in
| existence.
|
| Tackling individual use-cases is supposed to be something for
| third party "ecosystem" companies to go after, not the
| mothership itself.
| Etheryte wrote:
| This has been my experience as well. The moment you turn
| internet access on, Kagi Assistant starts outputting garbage.
| Turn it off and you're all good.
| fhd2 wrote:
| Definitely my experience. I manage context like a hawk, be it
| with Claude-as-Google-replacement or LLM integrations into
| systems. Too little and the results are off. Too much and the
| results are off.
|
| Not sure what Anthropic and co can do about that, but
| integrations feel like a step in the wrong direction. Whenever
| I've tried tool use, it was orders of magnitude more expensive
| and generally inferior to a simple model call with curated
| context from SerpApi and such.
| loufe wrote:
| Couldn't agree more. I wish all major model makers would
| build tools into their proprietary UIs to "summarize contents
| and start a new conversation with that base". My biggest
| slowdown with working with LLMs while coding is moving my
| conversation to a new thread because context limit is hit
| (Claude) or the coherent-thought threshold is exceeded
| (Gemini).
| fhd2 wrote:
| I never use any web interfaces, just hooked up gptel (an
| Emacs package) to Claude's API and a few others I regularly
| use, and I just have a buffer with the entire conversation.
| I can modify it as needed, spawn a fresh one quickly etc.
| There's also features to add files and individual snippets,
| but I usually manage it all in a single buffer. It's a
| powerful text editor, so efficient text editing is a given.
|
| I bet there are better / less arcane tools, but I think
| powerful and fast mechanisms for managing context are key
| and for me, that's really just powerful text editing
| features.
| medhir wrote:
| you hit the nail on the head. my experience with prompting LLMs
| is that providing extra context that isn't explicitly needed
| leads to "distracted" outputs
| ketzo wrote:
| I mean, to be honest, they gotta do both to achieve what
| they're aiming for.
|
| A truly useful AI assistant has context on my last 100,000
| emails - and also recalls the details of each individual one
| perfectly, without confusion or hallucination.
|
| Obviously I'm setting a high bar here; I guess what I'm saying
| is "yes, and"
| energy123 wrote:
| There's a niche for the kitchen sink approach. It's a type of
| search engine.
|
| Throw in all context --> ask it what is important for problem
| XYZ --> curate what it tells you, and feed that to another
| model to actually solve XYZ
| roordan wrote:
| This is my concern as well. How successful is it in selecting
| the correct tool out of hundreds or thousands?
|
| Different to what this integration is pushing, the LLMs usage
| in production based products where high accuracy is a
| requirement (99%), you have to give a very limited tool set to
| get any degree of success.
| bredren wrote:
| Had been planning a custom mcp for our orgs' jira.
|
| I'm a bit skeptical that it's gonna work out of the box because
| of the amount of custom fields that seem to be involved to make
| successful API requests in our case.
|
| But I would welcome, not having to solve this problem. Jira's
| interface is among the worst of all the ticket tracking
| applications I have encountered.
|
| But, I have found using a LM conversation paired within enough
| context about what is involved for successful POSTs against the
| API allow me to create update and relate issues via curl.
|
| It's begging for a chat based LLM solution like this. I'd just
| prefer the underlying model not be locked to a vendor.
|
| Atlassian should be solving this for its customers.
| viraptor wrote:
| You can also do the same thing locally:
| https://github.com/sooperset/mcp-atlassian Either with the
| cloude app, or some other system with any tool-using LLM you
| want.
| bredren wrote:
| I'm familiar with that MCP and was planning to build on top
| of it.
|
| I hadn't realized but the new integration seems to actually
| just be an official, closed-source MCP produced *by*
| Atlassian.
|
| sooperset's MCP is MIT licensed, so I wonder how much of the
| Atlassian edition is just a lift of that.
|
| There's a comment [1] on the actual integration page asking
| about custom fields, which I think is possibly a big issue.
|
| At first I thought the open-source version would get crushed
| by an actual Atlassian release, but not if Atlassian doesn't
| offer all the support for it to work really well no matter
| what customizations are fitted into each instance.
|
| My hypothesis is that it takes custom code to make this work,
| and using the off-the-shelf for Jira won't work. Hoping to be
| proven wrong though, as it would be less work for me on that
| front.
|
| [1] https://community.atlassian.com/forums/Atlassian-
| Platform-ar...
| rubenfiszel wrote:
| I feel dumb but how do you actually add Zapier or Confluence or
| custom MCP on the web version of claude? I only see it for
| Drive/Gmail/Github. Is it zoned/slow release?
| throwaway314155 wrote:
| edit: <Incorrect>im fairly certain these additions only work on
| Claude Desktop?</Incorrect>
|
| That or they're pulling an OpenAI and launching a feature that
| isn't actually fully live.
| rubenfiszel wrote:
| But the videos show claude web
| 85392_school wrote:
| This part seems relevant:
|
| > in beta on the Max, Team, and Enterprise plans, and will soon
| be available on Pro
| joshwarwick15 wrote:
| Created a list of remote MCP servers here so people can keep
| track of new releases - https://github.com/jaw9c/awesome-remote-
| mcp-servers
| zhyder wrote:
| Is there any way to access this via the API, after perhaps some
| oauth from the Anthropic user account?
| throwup238 wrote:
| The leap frogging at this point is getting insane (in a good way,
| I guess?). The amount of time each state of the art feature gets
| before it's supplanted is a few weeks at this point.
|
| LLMs were always a fun novelty for me until OpenAI DeepResearch
| which started to actually come up with useful results on more
| complex programming questions (where I needed to write all the
| code by hand but had to pull together lots of different libraries
| and APIs), but it was limited to 10/month for the cheaper plan.
| Then Google Deep Research upgraded to 2.5 Pro and with paid usage
| limits of 20/day, which allowed me to just throw everything at it
| to the point where I'm still working through reports that are a
| week or more old. Oh and it searched up to 400 sources at a time,
| significantly more than OpenAI which made it quite useful in
| historical research like identifying first edition copies of
| books.
|
| Now Claude is releasing the same research feature with
| integrations (excited to check out the Cloudflare MCP auth
| solution and hoping Val.town gets something similar), and a run
| time of up to 45 minutes. The pace of change was overwhelming
| half a year ago, now it's just getting ridiculous.
| user_7832 wrote:
| I agree with your overall message - rapid growth appears to
| encourage competition and forces companies to put their best
| foot forward.
|
| However, unfortunately, I cannot shower much praise on Claude
| 3.7. And if you (or anyone) asks why - 3.7 seems much better
| than 3.5, surely? - Then I'm moderately sure that you use
| Claude much more for coding than for any kind of conversation.
| In my opinion, even 3.5 _Haiku_ (which is available for free
| during high loads) is better than 3.7 Sonnet.
|
| Here's a simple test. Try asking 3.7 to intuitively explain
| anything technical - say, mass dominated vs spring dominated
| oscillations. I'm a mechanical engineer who studied this stuff
| and _I_ could not understand 3.7's analogies.
|
| I understand that coders are the largest single group of
| Claude's users, but Claude went from being my most used app to
| being used only after both chatgpt and Gemini, something that I
| absolutely regret.
| airstrike wrote:
| I too like 3.5 better than 3.7 and I use it pretty often.
| It's like 3.7 is better in 2 metrics but worse in 10
| different ones
| joshstrange wrote:
| I use Claude mostly for coding/technical things and something
| about 3.7 does not feel like an upgrade. I haven't gone back
| to 3.5 (mostly started using Gemini Pro 2.5 instead).
|
| I haven't been able to use Claude research yet (it's not
| rolled out to the Pro tier) but o1 -> o3 deep research was a
| massive jump IMHO. It still isn't perfect but o1 would often
| give me trash results but o3 deep research actually starts to
| be useful.
|
| 3.5->3.7 (even with extended thinking) felt like a
| nothingburger.
| mattlutze wrote:
| The expectation that one model be top marks for all things
| is, imo, asking too much.
| tiberriver256 wrote:
| 3.7 did score higher in coding benchmarks but in practice 3.5
| is much better at coding. 3.7 ignores instructions and does
| things you didn't ask it to do.
| UncleEntity wrote:
| I think it just does that to eat up your token quota and
| get you to upgrade.
|
| Like, ask it a simple question and it comes up with a full
| repo, complete with a README and a Makefile, when all you
| wanted to know was how efficient a particular algorithm
| would be in the included code.
|
| Can't wait until the add research to the Pro plan because,
| you know, I have questions...
| vineyardmike wrote:
| > I think it just does that to eat up your token quota
| and get you to upgrade.
|
| If you pay for a subscription then they don't have an
| incentive to use more tokens for the same answer.
|
| It's definitely because feedback from people has "taught"
| it that more boilerplate is better. It's the same reason
| ChatGPT is annoyingly complementary.
| spaceman_2020 wrote:
| 3.7 is too overactive
|
| I prefer Gemini 2.5 pro for all code now
| conception wrote:
| 2.5 is my "okay Claude can't get it" but first I check my
| "bank account" to see if I can afford it.
| ralusek wrote:
| Isn't 2.5 pro significantly cheaper?
| yunwal wrote:
| They're the same price, and Gemini has a large free tier.
| hombre_fatal wrote:
| Gemini 2.5 Pro has solved problems that Claude 3.7
| cannot, so I use it for the hard stuff.
|
| But Gemini is at least as overactive as Claude, sometimes
| even more overactive when it comes to something like
| comment spam.
|
| Of course, this can be fixed with prompting. And
| sometimes it feels sheepish complaining about the machine
| god doing most of my chore work that didn't even exist a
| couple years ago.
| suyash wrote:
| That has been the most annoying thing about it, so glad not
| paying for it anymore.
| danw1979 wrote:
| Can't you still use Sonnet 3.5 anyway ? or is that a
| paying subscriber feature only ?
| sannee wrote:
| I suspect that is precisely why it got better at coding
| benchmarks.
| garrickvanburen wrote:
| My current hypothesis: the more familiar you are with a topic
| the worse the results from any LLM.
| jeswin wrote:
| > My current hypothesis: the more familiar you are with a
| topic the worse the results from any LLM.
|
| That's not really true, since your prompts are also getting
| better. Better input leads to better output remains true,
| even with LLMs (when you see it as a tool).
| franga2000 wrote:
| Being more familiar with the topic definitely doesn't
| always make your prompts better. For a lot of things it
| doesn't really change (explain X, compare X and Y...) -
| and this is what is being discussed it. For giving
| "building" instructions (like writing code) it helps a
| bit, but even if you know exactly what you want it to
| write, getting it to do that is pretty much trial and
| errror (too much detail makes it follow word-for-word and
| produce bad code, too little and it misses important
| parts or makes dumb mistakes).
| jm547ster wrote:
| The opposite may be true, the more effective the model
| the lazier the prompting as it can seemingly handle not
| being micromanaged as with earlier versions.
| mac-mc wrote:
| He was saying that 3.5 is better than 3.7 on the same topic
| he knows well tho.
| user_7832 wrote:
| That is certainly the case in niche topics where published
| information is lacking, or needs common sense to synthesize
| proper outputs [1].
|
| However in this specific example, I don't remember if it
| was chatgpt or gemini or 3.5 Haiku but the other(s)
| explained it well enough. I think I re-asked 3.5 Haiku at a
| later point of time, and to my complete non-surprise, it
| gave an answer that was quite decent.
|
| 1 - For example, the field of DIY audio - which was funnily
| enough the source of my question. I'm no speaker designer,
| but combining creativity with engineering basics/rules of
| thumb seems to be something LLms struggle with terribly.
| Ask them to design a speaker and they come up with the most
| vanilla, tired, textbook design - despite several existing
| market products that are already so much ahead/innovative.
|
| I'm confident that if you asked an LLM an identical
| question for which there _is_ more discourse - eg make an
| interesting /innovative phone - you'd get relatively much
| better results.
| terminalcommand wrote:
| I built open baffle speakers based on measurements and
| discussion I had with Claude. I think it is really good.
|
| I am a novice, maybe that's why I liked it.
| danw1979 wrote:
| Amen to this. As soon as you ask an LLM to explain
| something in detail that you're a domain expert in, that's
| when you notice the flaws.
| startupsfail wrote:
| Yes, it's particularly bad when the information found on
| the web is flawed.
|
| For example, I'm not a domain expert, but I was looking
| for an RC motor for a toy project and OpenAI had happily
| tried to source a few, with Deep Research. Only the best
| candidate it had picked contained an obvious typo in the
| motor spec (68 grams instead of 680 grams), which is just
| impossible for a motor of specified dimensions.
| parineum wrote:
| > Yes, it's particularly bad when the information found
| on the web is flawed.
|
| It's funny you say that because I was going to echo your
| parents sentiment and point out it's exactly the same
| with any news article you read.
|
| The majority if content these LLMs are consuming is not
| from domain experts.
| 91bananas wrote:
| I had it generate a baseball lineup the other day, it
| printed out a list of the 13 kids names, then said (12
| players). Just straight up miscounted what it was doing,
| throwing a wrench to everything else it was doing beyond
| that point.
| eru wrote:
| Not really. I'm getting pretty good Computer Science theory
| out of Gemini and even ChatGPT.
| bsenftner wrote:
| It is like this with expert humans too. Which is why, no
| matter what, we will continue to _require_ expert humans
| not just "in the loop" but as the critical cogs that are
| the loop itself, just as it as always been. However, this
| time around those people will have AI augmentation, and be
| intellectually athletes of a nature our civilization has
| never seen.
| simsla wrote:
| I always tell people to trust the LLM to the same extent as
| an intern. Avoid giving it tasks you cannot verify the
| correctness of.
| fastball wrote:
| Seems clear to me that Claude 3.7 suffers from overfitting,
| probably due to Anthropic seeing that 3.5 was a smash hit in
| the LLM coding space and deciding their North star for 3.7
| should be coding benchmarks (which, like all benchmarks, do
| not properly capture the process of real-world coding).
|
| If it was actually good they would've named it 4.0, the fact
| that they went from 3.5 to 3.7 (weird jump) speaks volumes
| imo.
| snewman wrote:
| The numbering jump is because there was "Claude 3.5" and
| then "Claude 3.5 (new)" and they decided to retroactively
| stop the madness and rename the later to 3.6 (which is what
| everyone was calling it anyway).
| csomar wrote:
| Plateauing overall but apparently you can gain in certain
| directions while you lose on some. I've written an article a
| while back that current models are not that far from GPT-3.5:
| https://omarabid.com/gpt3-now
|
| 3.7 is definitively better at coding but you feel it lost a
| bit of maneuverability at other domains. For someone who
| wants code generated, it doesn't matter but I've found myself
| using DeepSeek first and then getting code output by 3.7.
| ilrwbwrkhv wrote:
| None of those reports are any good though. Maybe for shallow
| research, but I haven't found them deep. Can you share what
| kind of research you have been trying there where it has done a
| great job of actual deep research.
| Balgair wrote:
| I'm echoing this sentiment.
|
| Deep Research hasn't really been that good for me. Maybe I'm
| just using it wrong?
|
| Example: I want the precipitation in mm and monthly high and
| low temperature in C for the top 250 most populous cities in
| North America.
|
| To me, this prompt seems like a pretty anodyne and obvious
| task for Deep Research. It's long, tedious, but mostly coming
| from well structured data sources (wikipedia) across two
| languages at most.
|
| But when I put this in to any of the various models, I mostly
| get back ways to go and find that data myself. Like, I know
| how to look at Wikipedia, it's that I don't want to comb
| through 250 pages manually or try to write a script to handle
| all the HTML boxes. I want the LLM/model to do this days long
| tedious task for me.
| 85392_school wrote:
| The funny thing is that if your request only needed the top
| 100's temperature or the top 33's precipitation, it could
| just read "List of cities by average temperature" or "List
| of cities by average precipitation" and that would be it,
| but the top 250 requires reading 184x more pages.
|
| My perspective on this is that if Deep Research can't do
| something, you should do it yourself and put the results on
| the internet. It'll help other humans and AIs trying to do
| the same task.
| Balgair wrote:
| Yeah, that was intentional, well, somewhat.
|
| The project requires the full list of every known city in
| the western hemisphere and also Japan, Korea, and Taiwan.
| But that dataset is just maddeningly large, if it is
| possible at all. Like, I expect it to take me years, as I
| have to do a lot of translations. So, I figured that I'd
| be nice and just as for the top 250 for the various
| models.
|
| There's a lot more data that we're trying to get too and
| I'm hoping that I can get approval to post it as its a
| work thing.
| wyre wrote:
| If you have the data, but need to parse all of it,
| couldn't you upload it to your LLM of choice (with a
| large enough context window) and have it finish your
| project?
| XenophileJKO wrote:
| Well remember listing/ranking things are structurally
| hard for these models because you have to keep track of
| what it has listed and what it hasn't, etc.
| Balgair wrote:
| I'm sorry I was unclear. No, I do not have the data yet
| and I need to get it.
| therein wrote:
| Sounds like the you're having it conduct research and
| then solve the Knapsack problem for you on the collected
| data. We should do the same for the traveling salesman
| one.
|
| How do you validate its results in that scenario? Just
| take its word for it?
| Balgair wrote:
| Ahh, no. We'll be doing more research on the data once we
| have it. Things like ranking and averages and
| distributions on the data will come later, but first we
| just need it to begin with.
| sxg wrote:
| That's actually not what deep research is for, although you
| can obviously use it however you like. Your query is just
| raw data collection--not research. Deep research is about
| exploring a topic primarily with academic and other high-
| quality sources. It's a starting point for your own
| research. Deep research creates a summary report in ~10 min
| from more sources than you could probably read in a month,
| and then you can steer the conversation from there.
| Alternatively, you can just use deep research's sources as
| a reading list for yourself so you can do your own
| analysis.
| Balgair wrote:
| I think we have very different definitions of the word
| 'research' then.
|
| I'd say that what you're saying is 'synthesis'. The
| 'Intro/Discussion' sections of a journal article.
|
| For me, 'research' means the work of going through and
| getting all the data in the first place. Like, going out
| and collecting dino bones in the hot sun, measuring all
| the soil samples, etc. - that is research. For me, asking
| these models to go collate some webpages, I mean, you
| spend the first weeks of a summer undergrad's time to go
| do this kid of thing to get them used to the file systems
| and spruce up their organization skills, see where they
| are at. Writing the paper up, that's part of research
| sure, but not the hard part that really matters.
| sxg wrote:
| Agreed--we're working with different definitions of
| "research". The deep research products from OpenAI,
| Google Gemini, and Perplexity seem to be more aligned
| with my definition of research if that helps you gain
| more utility from them.
| tomrod wrote:
| It's excellent at producing short literature reviews on
| open access papers and data. It has no sense of judgment,
| trusting most sources unless instructed otherwise.
| fakedang wrote:
| Gemini's Deep Research is very good at discriminating
| between sources though, in my experience (haven't tried
| Claude or Perplexity). It finds really obscure but very
| relevant documents that don't even show up in Google
| Search for the same queries. It also discounts results
| that are otherwise irrelevant or very low-value from the
| final report. But again, it is just a starting point as
| the generated report is too short, and I make sure to
| check all the references it gives once again. But that's
| where I find its value.
| spaceman_2020 wrote:
| My wife, who is writing her PhD right now and teaches
| undergraduate students, says they are at the level of a
| really bright final year undergrad
|
| Maybe in a year, they'll hit the graduate level. But we're
| not near PhD level yet
| xrdegen wrote:
| It is because you are just such a genius that already knows
| everything unlike us stupid people that find these tools
| amazingly useful and informative.
| cwillu wrote:
| The failure mode is that people unfamiliar with a subject
| aren't able to distinguish careful analysis from bullshit.
| However the second failure mode where someone pointing that
| out is assumed to be calling people stupid is a
| longstanding wetware bug.
| greymalik wrote:
| Out of curiosity - can you give any examples of the programming
| questions you are using deep research on? I'm having a hard
| time thinking of how it would be helpful and could use the
| inspiration.
| dimitri-vs wrote:
| Easy, any research task that will take you 5 minutes to
| complete it's worth firing off a Deep Research request while
| you work on something else in parallel.
|
| I use it a lot when documentation is vague or outdated. When
| Gemini/o3 can't figure something out after 2 tries. When I am
| working with a service/API/framework/whatever that I am very
| unfamiliar with and I don't even know what to Google search.
| jerpint wrote:
| Have you tried using llms.txt when available? Very useful
| resource
| emorning3 wrote:
| I often use Chrome to valid what I think I know.
|
| I recently asked Chrome to show me how to apply the Knuth-
| Bendix completion procedure to propositional logic, and I had
| already formed my own thoughts about how to proceed (I'm
| building a rewrite system that does automated reasoning).
|
| The response convinced me that I'm not a total idiot.
|
| I'm not an academic and I'm often wrong about theory so the
| validation is really useful to me.
| scargrillo wrote:
| That's a perfect example of LLMs providing epistemic
| scaffolding -- not just giving you answers, but helping you
| check your footing as you explore unfamiliar territory.
| Especially valuable when you're reasoning through something
| structurally complex like rewrite systems or proof
| strategies. Sometimes just seeing your internal model
| reflected back (or gently corrected) is enough to keep you
| moving.
| miki_oomiri wrote:
| "Chrome" ? What do you mean? Gemini?
| risyachka wrote:
| What are you talking about
|
| It is literally stagnated for a year now
|
| All that changed is they connect more apis.
|
| And add a thinking loop with same model powering it
|
| This is the reason it seems fast - nothing really happens
| except easy things
| tymscar wrote:
| I totally agree with you, especially if you actually try
| using these models, not just looking at random hype posters
| on twitter or skewed benchmarks.
|
| That being said, isn't it strange how the community has polar
| opposite views about this? Did anything like this ever happen
| before?
| itissid wrote:
| I've been using it for pre scoping things I have no idea about
| and rapidly iterating by refeeding it a version with guard
| rails and conditions from previous chats.
|
| Like I wanted to scope how to build a home made TrueNAS Scale
| unit, it helped me with a avoiding pitfalls like knowing that I
| needed two GPUs minimum to run the OS and local llms, and speed
| up config for a CLI back up of my Dropbox locally(it told me to
| use the right filesystem format over ZFS to make Dropbox client
| work).
|
| It has researched how I can structure my web app for building
| payment system on the web(something I knew nothing about) to
| writing small tools to talk to my document collection and index
| them into collections in Anki in one day.
| wilg wrote:
| o3 since it can web search while reasoning is a really useful
| lighter weight deep research
| spaceman_2020 wrote:
| Gemini 2.5 pro was the moment for me where I really thought
| "this is where true adoption happens"
|
| All those talks about AI replacing people seemed a little far
| fetched in 2024. But in 2025, I really think models are getting
| good enough
| antupis wrote:
| You still need "human in the loop" because with simple tasks
| or some tasks that have lots of training material, models can
| one-shot answer and are like super good. But if the domain
| grows too complex, there are some not-so-obvious
| dependencies, or stuff that is in bleeding edge. Models fail
| pretty badly. So you need someone to split those complex
| tasks to more simpler familiar steps.
| iLoveOncall wrote:
| Calling some APIs is leap-frogging? You could do this with
| GPT-3, nothing has changed except it's branded under a new name
| and tries to establish a (flawed) standard.
|
| If there was truly any innovation still happening in OpenAI,
| Anthropic, etc., they would be working on models only, not on
| side features that someone could already develop over a
| weekend.
| never_inline wrote:
| Why would you love on-call though?
| iLoveOncall wrote:
| In my previous team most of our oncall requests came from
| bug reports by customers on various tools that we owned, so
| to be able to work on random tools that my team owned was a
| nice change of pace / scenery compared to working on the
| same thing for 3 months uninterrupted.
|
| Now I'm in a new team where 99% of our oncall tickets come
| from automated alarms and 80% of them are a subset of a few
| issues where the root-cause isn't easy to address but there
| is either nothing to actually do once investigated, or the
| fix is a one time process that is annoying to run, so the
| username isn't accurate anymore :)
|
| I still like the change of pace though, 0 worries about
| sprint tasks or anything else for a week every few months.
| apwell23 wrote:
| > DeepResearch which started to actually come up with useful
| results on more complex programming questions
|
| Is there a youtube video of ppl using this on complex open
| source projects like linux kernel or maybe something like
| pytorch.
|
| How come none of the oss pojects( atleast not the ones i
| follow) are progressing fast(er) from AI like 'deepresearch'
| WhitneyLand wrote:
| The integrations feel so rag-ish. It talks, tells you it's going
| to use a tool, searches, talks about what it found...
|
| Hope one day it will be practical to do nightly finetunes of a
| model per company with all core corporate data stores.
|
| This could create a seamless native model experience that knows
| about (almost) everything you're doing.
| pyryt wrote:
| I would love to do this on my codebase after every commit
| notgiorgi wrote:
| why is finetuning talked about so much less than RAG? is it not
| viable at all?
| mring33621 wrote:
| i'm not an expert in either, but RAG is like dropping some
| 'useful' info into the prompt context, while fine tuning is
| more like a performing mix of retraining, appending re-
| interpretive model layers and/or brain surgery.
|
| I'll leave it to you to guess which one is harder to do.
| disgruntledphd2 wrote:
| RAG is much cheaper to run.
| computerex wrote:
| It's significantly harder to get right, it's a very big
| stepwise increase in technical complexity over in context
| learning/rag.
|
| There are now some light versions of fine tuning that don't
| update all the model weights but train a small adapter layer
| called Lora which is way more viable commercially atm in my
| opinion.
| ijk wrote:
| There were initial difficulties in finetuning that made it
| less appealing early on, and that's snowballed a bit into
| having more of a focus on RAG.
|
| Some of the issues still exist, of course:
|
| * Finetuning takes time and compute; for one-off queries
| using in-context learning is vastly more efficient (i.e.,
| look it up with RAG).
|
| * Early results with finetuning had trouble reliably
| memorizing information. We've got a much better idea of how
| to add information to a model now, though it takes more
| training data.
|
| * Full finetuning is very VRAM intensive; optimizations like
| LoRA were initially good at transferring style and not
| content. Today, LoRA content training is viable but requires
| training code that supports it [1].
|
| * If you need a very specific memorized result and it's
| costly to get it wrong, good RAG is pretty much always going
| to be more efficient, since it injects the exact text in
| context. (Bad RAG makes the problem worse, of course).
|
| * Finetuning requires more technical knowledge: you've got to
| understand the hyperparameters, avoid underfitting and
| overfitting, evaluate the results, etc.
|
| * Finetuning requires more data. RAG works with a handful
| datapoints; finetuning requires at least three orders of
| magnitude more data.
|
| * Finetuning requires extra effort to avoid forgetting what
| the model already knows.
|
| * RAG works pretty well when the task that you are trying to
| perform is well-represented in the training data.
|
| * RAG works when you don't have direct control over the model
| (i.e., API use).
|
| * You can't finetune most of the closed models.
|
| * Big, general models have outperformed specialized models
| over the past couple of years; if it doesn't work now, just
| wait for OpenAI to make their next model better on your
| particular task.
|
| On the other hand:
|
| * Finetuning generalizes better.
|
| * Finetuning has more influence on token distribution.
|
| * Finetuning is better at learning new tasks that aren't as
| present in the pretraining data.
|
| * Finetuning can change the style of output (e.g.,
| instruction training).
|
| * When finetuning pays off, it gives you a bigger moat (no
| one else has that particular model).
|
| * You control which tasks you are optimizing for, without
| having to wait for other companies to maybe fix your problems
| for you.
|
| * You can run a much smaller, faster specialized model
| because it's been optimized for your tasks.
|
| * Finetuning + RAG outperforms just RAG. Not by a lot,
| admittedly, but there's some advantages.
|
| Plus the RL Training for reasoning has been demonstrating
| unexpectedly effective improvements on relatively small
| amounts of data & compute.
|
| So there's reasons to do both, but the larger investment that
| finetuning requires means that RAG has generally been more
| popular. In general, the past couple of years have been won
| by the bigger models scaling fast, but with finetuning
| difficulty dropping there is a bit more reason to do your own
| finetuning.
|
| That said, for the moment the expertise + expense + time of
| finetuning makes it a tough business proposition if you don't
| have a very well-defined task to perform, a large dataset to
| leverage, or other way to get an advantage over the multi-
| billion dollar investment in the big models.
|
| [1] https://unsloth.ai/blog/contpretraining
| jimbokun wrote:
| So is a good summary:
|
| 1. If you have a large corpus of valuable data not
| available to the corporations, you can benefit from fine
| tuning using this data.
|
| 2. Otherwise just use RAG.
| msp26 wrote:
| Thanks for the detailed comment.
|
| I had no idea that fine tuning for adding information is
| viable now. Last I checked (year+ back) it seemed to not
| work well.
| omneity wrote:
| RAG is infinitely more accessible and cheaper than
| finetuning. But it is true that finetuning is getting
| severely overlooked in situations where it would outperform
| alternatives like RAG.
| riku_iki wrote:
| > RAG is infinitely more accessible and cheaper than
| finetuning.
|
| it depends on your data access pattern. If some text goes
| through LLM input many times, it is more efficient for LLM
| to be finetuned on it once.
| omneity wrote:
| This assumes the team deploying the RAG-based solution
| has equal ability to either engineer a RAG-based system
| or to finetune an LLM. Those are different skillsets and
| even selecting which LLM should be finetuned is a complex
| question, let alone aligning it, deploying it, optimizing
| inference etc.
|
| The budget question comes into play as well. Even if text
| is repetitively fed to the LLM, that might happen over a
| long enough time compared to finetuning which is a sort
| of capex that it is financially more accessible.
|
| Now bear in mind, I'm a big proponent of finetuning where
| applicable and I try to raise awareness to the
| possibilities it opens. But one cannot deny RAG is a lot
| more accessible to teams which are likely developers / AI
| engineers compared to ML engineers/researchers.
| riku_iki wrote:
| > But one cannot deny RAG is a lot more accessible to
| teams which are likely developers / AI engineers compared
| to ML engineers/researchers.
|
| It looks like major vendors provide simple API for fine-
| tuning, so you don't need ML engineers/researchers:
| https://platform.openai.com/docs/guides/fine-tuning
|
| Setting RAG infra is likely more complicated than that.
| omneity wrote:
| You are certainly right, managed platforms make
| finetuning much easier. But managed/closed model
| finetuning is pretty limited and in fact should be named
| "distribution modeling" or something.
|
| Results with this method are significantly more limited
| compared to all the power open-weight finetuning gives
| you (and the skillset needed in return).
|
| And in either case don't forget alignment and evals.
| retinaros wrote:
| fine tuning can cost 80$ and a few hours. a good rag doesnt
| exist
| never_inline wrote:
| Can find tuning produce results as grounded as RAG?
|
| How many epochs do you run?
| onel wrote:
| You usually fine tune when you want to add capabilities (an
| output style, json output, function calling, etc). You use
| RAG to add knowledge
| VSerge wrote:
| Ongoing demo of integrations with Claude by a bunch of A-list
| companies: Linear, Stripe, Paypal, Intercom, etc.. It's live now
| on: https://www.youtube.com/watch?v=njBGqr-BU54
|
| In case the above link doesn't work later on, the page for this
| demo day is here: https://demo-day.mcp.cloudflare.com/
| mkagenius wrote:
| are people really doing this mcp thing, yikes. Tomorrow, let me
| reinvent css as model context design (mcd)
| warkdarrior wrote:
| Do you have a better solution to give models on-demand access
| to data sources?
| mkagenius wrote:
| you mean other than writing an api? no
| cruffle_duffle wrote:
| And what is the protocol for the interface between the GPU-
| based LLM and the API? How does the LLM signal to make a
| tool call? What mechanism does it use?
|
| Because MCP isn't an API it's the protocol that defines how
| the LLM even calls the API in the first place. Without it,
| all you've got is a chat interface.
|
| A lot of people misunderstand what is the role of MCP. It's
| the signaling the LLM uses to reach out of its context
| window and do things.
| turblety wrote:
| Is there a reason they went and built some new standard, rather
| than just using a http api?
| knowaveragejoe wrote:
| You can use either HTTP or stdio.
| imbnwa wrote:
| Feel like middle management is gonna go well before engineers do
| with LLM rate of advancement
| DebtDeflation wrote:
| That started awhile ago. Google "the great flattening".
| 6stringmerc wrote:
| Feed Claude the data willingly to learn more about human behavior
| they can't scrape or obtain otherwise without consent? Hard pass.
| I'm not telling any AI any more about what it means to be a
| creative person because training it how to suffer will only
| further hurt my job prospects. Nice try, no dice.
| n_ary wrote:
| Is this the beginning of the apps for everything era and finally
| the SaaS for your LLM begins? Initially we had internet but value
| came when instead of installed apps, webapps arrived to become
| SaaS. Now if LLMs can use specific remote MCP which is another
| SaaS for your LLM, the remote MCP powered service can charge a
| subscription to do wonderful things and voila! Let the new golden
| age of SaaS for LLMs begin and the old fad(replace job XYZ with
| AI) die already.
| throwaway7783 wrote:
| MCP is yet another interface for an existing SaaS (like UI and
| APIs), but now magically "agent enabled". And $$$ of course
| clvx wrote:
| I'm more excited I can run now a custom site, hook an MCP for
| it, and have all the cool intelligence I had to pay for _SaaS_
| without having to integrate to them plus govern my data, it 's
| a massive win. I just see AI assistant coding replicating
| current SaaS services that I can run internally. If my shop was
| a specific stack, I could aim to have all my supporting apps in
| that specific stack using AI assistant coding, simplifying
| operations, and being able to hook up MCP's to get intelligence
| from all of them.
|
| Truly, OSS should be more interesting in the next decade for
| this alone.
| heyheyhouhou wrote:
| We should all thank the chinese companies for releasing so
| many incredible open weight models. I hope they keep doing
| it, I dont want to rely on OpenAI, Anthropic or Google for
| all my future computer interactions.
| achierius wrote:
| Don't forget Meta, without them we probably wouldn't have
| half the publicly available models we do today.
| naravara wrote:
| On one hand, yes this is very cool for a whole host of personal
| uses. On the other hand giving any company this level of access
| to as many different personal data sources as are out there
| scares the shit out of me.
|
| I'd feel a lot better if we had something resembling a
| comprehensive data privacy law in the United States because I
| don't want it to basically be the Wild West for anyone handling
| whatever personal info doesn't get covered under HIPAA.
| falcor84 wrote:
| Absolutely agreed, but just wanted to mention that it's
| essentially the same level of access you would give to
| Zapier, which is one of their top examples of MCP
| integrations.
| n_ary wrote:
| It took many years for online tracking, iframes, sticky
| cookies and cambridge analytics before things like GDPR came
| into existence. We have to similarly wait a few years before
| similar major leaks happen through LLM
| pipelines/integrations. Sadly, that is the reality we live
| with.
| jimbokun wrote:
| The question is whether or not it happens before the
| emergence of Skynet.
| OtherShrezzing wrote:
| I'd love a _tip jar_ MCP, where the LLM vendor can
| automatically tip my website for using its
| content/feature/service in a query's response. Even if the
| amount is absolutely minuscule, in aggregate, this might make
| up for ad revenue losses.
| fredoliveira wrote:
| Not that exactly, but I just saw this on twitter a few
| minutes ago from Stripe:
| https://x.com/jeff_weinstein/status/1918029261430255626
| insin wrote:
| It's perfect, nobody will have time to care about how many 9s
| your service has because the nondeterministic failure mode now
| sitting slap-bang in the middle is their problem!
| Manfred wrote:
| Imagine dynamic subscription rates based on vibes where you
| won't even notice price hikes because not even the supplier
| can explain what they are.
| donmcronald wrote:
| > Now if LLMs can use specific remote MCP which is another SaaS
| for your LLM, the remote MCP powered service can charge a
| subscription to do wonderful things and voila!
|
| I've always worked under the assumption the best employees make
| themselves replaceable via well defined processes and high
| quality documentation. I have such a hard time understanding
| why there's so much willingness to integrate irreplaceable SaaS
| solutions into business processes.
|
| I haven't used AI a ton, but everything I've done has focused
| on owning my own context, config, etc.. How much are people
| going to be willing to pay if someone else owns 10+ years of
| their AI context?
|
| Am I crazy or is owning the context massively valuable?
| brumar wrote:
| Hello fellow context owner. I like my modules with their
| context.sh at their root level. If crafted with care, magic
| happens. Reciprocally, when AI derails, it's most often due
| to bad context management and fixed by improving it.
| drivingmenuts wrote:
| Is each Claude instance a separate individual or is a shared AI?
| Because I'm not sure I would want an AI that learned about my
| confidential business information sharing that with anyone else,
| without my express permission.
|
| This does not sound like it would be learning general information
| helpful across an industry, but specific, actionable information.
|
| If not available now, is that something that AI vendors are
| working toward? If so, what is to keep them from using that
| knowledge to benefit themselves or others of their choosing,
| rather than the people they are learning from?
|
| While people understand ethics, morals and legality (and ignore
| them), that does not seem like something that an AI understands
| in a way that might give them pause before doing an action.
| zoogeny wrote:
| I'm curious what kind of research people are doing that takes 45
| minutes of LLM time. Is this a poke at the McKinsey consultant
| domain?
|
| Perhaps I am just frivolous with my own time, but I tend to use
| LLMs in a more iterative way for research. I get partial answers,
| probe for more information, direct the attention of the LLM away
| from areas I am familiar and towards areas I am less familiar. I
| feel if I just let it loose for 45 minutes it would spend too
| much time on areas I do not find valuable.
|
| This seems more like a play for "replacement" rather than
| "augmentation". Although, I suppose if I had infinite wealth, I
| could kick of 10+ research agents each taking 45 minutes and then
| review their output as it became available, then kick off round
| 2, etc. That is, I could do my process but instead of
| interactively I could do it asynchronously.
| throwup238 wrote:
| That iterative research process is exactly how I use Google
| Deep Research since it has a 20/day rate limit. Research a
| problem, notice some off hand assumption or remark the report
| made, and fire off another research run asking about it. It
| depends on what you work on; in my use case I often have to do
| hours of research for 30 minutes of work like when integrating
| a bunch of different vendors' APIs or pouring over datasheets
| for EE, so it's worth firing off research and then working on
| something else for 10-20 minutes (it helps that the Gemini app
| fires off a push notification when the report is done -
| Anthropic please do this! Even for requests made from the web
| app).
|
| As for long research times, one thing I've been using it for is
| historical research on old books. Gemini DeepResearch was the
| first one able to properly explain the nuances of identifying a
| chimeral first edition Origin of Species after taking half an
| hour and reading 400 sources. It went into all the important
| details like spelling errors and the properties of chimeral
| FY2** copies found in various libraries around the world.
| abhisek wrote:
| Where is Skynet and when is judgement day?
| 52-6F-62 wrote:
| 1. Publishing advertisements all over the place. 2. Some
| Tuesday
| pton_xd wrote:
| "To start, you can choose from Integrations for 10 popular
| services, including Atlassian's Jira and Confluence, Zapier,
| Cloudflare, Intercom, Asana, Square, Sentry, PayPal, Linear, and
| Plaid. ... Each integration drastically expands what Claude can
| do."
|
| Give us an LLM with better reasoning capabilities, please! All
| this other stuff just feels like a distraction.
| Centigonal wrote:
| Building integrations is a more predictable way of developing a
| smaller competitive advantage versus research. I think most of
| the leading AI companies are adopting a multi-arm strategy of
| research + product/ecosystem development to balance their
| risks.
| atonse wrote:
| I disagree. They can walk and chew gum, do both things at once.
| And this practical stuff is very important.
|
| I've been using the Atlassian MCP for nearly a month now, and
| it's completely changed (and eliminated) the feeling of having
| an overwhelming backlog.
|
| I can have it do things like "find all the tickets related to
| profile editing and combine them into one epic" where it works
| perfectly. Or "help me prioritize the 15 tickets assigned to me
| this sprint" and it'll actually go through and suggest "maybe
| you can do these two tickets first since they seem smaller,
| then do this big one" - i haven't hooked it up to my calendar
| yet.
|
| But I'd love for it to suggest things like "do this one ticket
| that requires a lot of heads down time on wednesday since you
| don't have any meetings. I can create a block on your calendar
| so that nobody will schedule a meeting then"
|
| Those are all superhuman things that can be done with MCP and a
| smart model.
|
| I've defined rules in cursor that say "when I ask you to mark
| something ready for test, change the status and assign it to <x
| person>, and leave a comment summarizing the changes"
|
| If you look at my JIRA comments now, you'd wonder how I had so
| much time to write such thorough comments. I don't, Cursor and
| whatever model is doing it for me.
|
| It's been an absolute game changer. MCP is going to be what the
| App store was to mobile. Yes you can get by without it, but
| actually hooking into all your daily tool is when this stuff
| gets insanely valuable in a practical sense.
| OJFord wrote:
| > If you look at my JIRA comments now, you'd wonder how I had
| so much time to write such thorough comments. I don't, Cursor
| and whatever model is doing it for me.
|
| How do your colleagues feel about it?
| warkdarrior wrote:
| My colleagues' LLM assistants think that my LLM assistant
| leaves great JIRA comments.
| atonse wrote:
| haha! Funny enough I do have to tell the LLMs to leave
| concise comments.
|
| I also don't want to read too many unnecessary words.
| sdesol wrote:
| Joking aside, I do believe we are moving into a era where
| we have LLMs write for each other and humans have a
| dedicated TL;DR. This includes code with a lot of
| comments or design styles that might seem obvious or
| stupid but can help another LLM.
| eknkc wrote:
| Why use JIRA at this point then?
|
| Can't we point an LLM to a sqlite db and tell it to treat
| it as an issue tracking db and have everyone do the same.
|
| The service (jira) would materialize inside the LLMs
| then.
|
| Why even use abstractions like tickets etc. Ask LLM what
| to do.
| zoogeny wrote:
| JIRA is more than just ticket management for most big
| orgs. It provides a reporting interface for business with
| long-term planning capabilities. A lot of the annoying
| things that devs have to do in JIRA is often there to
| make those functions more valuable. In other cases it is
| a compliance thing as well. Some certifications necessary
| for enterprise sales require audit trails for all code
| changes, from the bug report to the code commit. JIRA
| provides the integration and reporting necessary for
| that.
|
| Unless you can provide the same visibility, long-term
| planning features and compliance aspects of JIRA on top
| of you sqlite db, you won't compete with JIRA. But if you
| do add those things on top of SQLite and LLMs, you
| probably have a solid business idea. But you'd first need
| to understand JIRA well enough to know why they are there
| in the first place.
| falcor84 wrote:
| Exactly, applying the principle of Chesterton's Fence
| [0].
|
| [0] https://en.wikipedia.org/w/index.php?title=Wikipedia:
| FENCE
| atonse wrote:
| Well I had half a mind to not tell them to see what they'd
| say, but I also was excited to show everyone so they can
| also be empowered with it.
|
| One of them said "yeah I was wondering cuz you never write
| that much" - as a leader, I actually don't set a good
| example of how to leave quality JIRA comments. And my view
| with all these things is that I have to lead by example,
| not by orders.
|
| With the help of these kinds of tools, we can improve the
| quality of these comments. And I wouldn't expect others to
| write them manually, more that I wanted to show that
| everyone's use of JIRA on the team can improve.
| OJFord wrote:
| Notice they commented on the quantity, not the quality?
|
| I don't think it's good leadership to unleash drivel on
| an organisation, have people waste time reading and
| perhaps replying to it, thinking it's something important
| and thoughtful coming from atonse.
|
| Good thing you told them though, now they can ignore it.
| stefan_ wrote:
| It sure seems like the next evolution of Jira though.
| Designed to waste everyones time, picked by "leaders"
| that don't use it. Why not spam tickets with LLM drivel?
| They are perfect to pick up on all the inconsistency in
| the PM insanity driven custom designed workflow - and
| comment on it tagging a bunch of stray people seen in the
| ticket history, the universal exit hatch.
| atonse wrote:
| In another comment I mentioned that I ask for it to be
| concise.
|
| Also, a lot of the kinds of comments are things like,
| when you combine a bunch of tickets, leaving comments on
| the cancelled tickets to show why they were cancelled.
|
| In the past, that info simply wouldn't be there.
| sensanaty wrote:
| Someone please shoot me if my PM ever gets this idea in
| his head of using LLM slop to spam tickets with en masse.
|
| There's nothing I hate more than people sending me their
| AI messages, be it in a ticket or a PR or even on Slack.
| I'm forced to engage and spend effort on something it
| took them all of 3 seconds to generate without even
| proofreading what they're sending me says. The amount of
| times I've had to ask 11 clarifying questions because
| their message has 11 contradictions within itself is
| maddening to the highest degree.
|
| The worst is when I call out one of these numerous
| contradictions, and the reply is "oh haha, stupid Claude
| :)", makes my blood boil and at the same time amazes me
| that someone has so little pride and respect for their
| fellow humans to do crap like that.
| artur_makly wrote:
| "I remember those days when we manually wrote
| comments"... - what were comments papa?
| atonse wrote:
| Sounds like your coworkers might be abusing things here.
|
| I'm not remotely interested in throwing random slop in
| there.
|
| In fact, we did try a year ago to have AI help write our
| tickets and it was very clear that they were AI
| generated. There was way too much nonsense in there that
| wasn't relevant to our product.
|
| So we don't do that.
| zoogeny wrote:
| Honestly, that backlog management idea is probably the first
| time an MCP actually sounded appealing to me.
|
| I'm not in that world at the moment, but I've been the lead
| on several projects where the backlog has became a dumping
| ground of years of neglect. You end up with this tiered
| backlog thing where one level of backlog gets too big so you
| create a second tier of backlog for the stuff you are
| actually going to work on. Pretty soon you end up with
| duplicates in the second tier backlog for items already in
| the base level backlog since no one even looks at that old
| backlog anymore.
|
| I've done a lot of tidy up myself when I inherit this kind of
| mess, just closing tickets we definitely will never get to,
| de-duping, adding context when available, grouping into
| epics, tagging with relevant "tech-debt", "security", "bug",
| "automation", etc. But when there are 100s of tickets it is a
| slog. Having an LLM do this makes so much sense.
| organsnyder wrote:
| I have Claude hooked up to our project management system,
| GitHub, and my calendar (among other things). It's already
| proving extremely useful for various project management
| tasks.
| edaemon wrote:
| Lots of reported security issues with MCP servers seemed to be
| mitigated by their local-only setup. These MCP implementations
| are remotely accessible, do they address security differently?
| paulgb wrote:
| Largely, yes -- one of the big issues with using other people's
| random MCP servers is that they are run by default as a system
| process, even if they only need to speak over an API. Remote
| MCP mitigates this by not running any untrusted code locally.
|
| What it _doesn't_ seem to yet mitigate is prompt injection
| attacks, where a tool call description of one tool convinces
| the model to do something it shouldn't (like send sensitive
| data to a server owned by the attacker.) I think these concerns
| are a little bit overblown though; things like pypi and the
| Chrome Extension store scare me more and it doesn't stop them
| from mostly working.
| zoogeny wrote:
| They offhand mention OAuth integration in their discussion of
| Cloudflare integrated solutions. I can't see how that would be
| any less secure than any other OAuth protected API offering.
| Nijikokun wrote:
| context windows are too small and conversely larger windows are
| not accurate enough its annoying
| indigodaddy wrote:
| So any chat to Claude will now just auto-activate web search to
| be included? What if I try to use it just as a search engine
| exclusively? Also will proxies like Openrouter have access to the
| web search capabilities?
| gianpaj wrote:
| > Web search is now globally available to all Claude.ai paid
| plans.
| surfingdino wrote:
| I don't know why web search is such a big deal. You can
| implement it with any LLM that offers an API and function
| calling.
| tene80i wrote:
| Do you think most people know how to do that, or even what it
| means? The market is larger than just software engineers.
| ChicagoDave wrote:
| There is targeted value in integrations, but everything still
| leads back to larger context windows.
|
| I love MCP (it's way better than plain Claude) but even that runs
| into context walls.
| davee5 wrote:
| I'm quite struck by the title of this announcement. The box being
| drawn around "your world" shows how narrow the AI builder's
| window into reality tends to be.
|
| > a new way to connect your apps and tools to Claude. We're also
| expanding... with an advanced mode that searches the web.
|
| The notion of software eating the world, and AI accelerating that
| trend, always seems to forget that The World is a vast thing, a
| physical thing, a thing that by its very nature can never be
| fully consumed by the relentless expansion of our digital
| experiences. Your worldview /= the world.
|
| The cynic would suggest that the teams that build these tools
| should go touch grass, but I think that misses the mark. The real
| indictment is of the sort of thinking that improvements to
| digital tools [intelligences?] in and of themselves can
| constitute truly substantial and far reaching changes.
|
| The reach of any digital substrate inherently limited, and this
| post unintentionally lays that bare. And while I hear
| accelerationists invoking "robots" as the means for digital
| agents to expand their potent impact deeper into the real world I
| suggest this is the retort of those who spend all day in apps,
| tools, and the web. The impacts and potential of AI is indeed
| enormous, but some perspective remains warranted and occasional
| injections of humility and context would probably do these teams
| some good.
| dang wrote:
| (Just for context: we've since changed the title above.
| Corporate press release titles are rarely a good fit for HN and
| we usually change them.
|
| https://hn.algolia.com/?dateRange=all&page=0&prefix=true&sor...
| )
| atonse wrote:
| I think with MCPs and related tech, if Apple just internally went
| back to the drawing board and integrated the concept of MCPs
| directly into iOS (via the "Apple Intelligence" umbrella) and
| seamlessly integrated it into the App Store and apps, they will
| win the mobile race for this.
|
| Being Apple, they would have to come up with something novel like
| they did with push (where you have _one_ OS process running that
| delegates to apps rather than every app trying to handle push
| themselves) rather than having 20 MCP servers running. But I
| think if they did this properly, it would be so amazing.
|
| I hope Apple is really re-thinking their absolutely comical start
| with AI. I hope they regroup and hit it out of the park (like how
| Google initially stumbled with Bard, but are now hitting it out
| of the park with Gemini)
| mattlondon wrote:
| Do you really think Apple can catch up with and then surpass
| all these SOTA AI labs?
|
| They bet big and got distracted on VR. It was obviously the
| wrong choice at the time, and even more so now. They're going
| to have to abandon all that VR crap and pivot hard to AI to try
| and catch up. I think the more likely case is they _can 't_
| catch up now and will just have to end up licensing Gemini from
| Google/Google paying them to use Gemini as the default AI.
| atonse wrote:
| No I'm not saying Apple even has to build their own model.
| I'm saying Apple can build a stellar _product_ experience
| around it.
|
| As others have pointed out, if that's what App Intents are,
| have they started to integrate this as part of Apple
| Intelligence?
| _pdp_ wrote:
| Apple already has the equivalent of MCP.
| https://developer.apple.com/documentation/appintents.
| bloomca wrote:
| That's just App Intents. I don't think they lack data at this
| point, they just struggle how to use that data on the OS level
| cruffle_duffle wrote:
| The video demos never really showed the auth "story" but I assume
| that there is some oauth step to connect Claude with your MCP
| service, right?
| belter wrote:
| All these integrations are likely to cause a massive security
| leak sooner or later.
| OJFord wrote:
| Where's the permissioning, the data protection?
|
| People will say 'aaah ad company' (me too sometimes) but I'd
| honestly trust a Google AI tool with this way more. Not just
| because it already has access to my Google Workspace obviously,
| but just because it's a huge established tech firm with decades
| of experience in trying not to lose (or have taken) user data.
|
| Even if they get the permissions right and it can only read my
| stuff if I'm just asking it to 'research', now Anthropic has all
| that and a target on their backs. And I don't even know what 'all
| that' is, whatever it explored deeming it maybe useful.
|
| Maybe I'm just transitioning into old guy not savvy with latest
| tech, but I just can't trust any of this 'go off and do whatever
| seems correct or helpful with access to my filesystem/Google
| account/codebase/terminal' stuff.
|
| I like chat-only (well, +web) interactions where I control the
| input and taking the output, but even that is not an experience
| that gives me any confidence in giving uncontrolled access to
| stuff and it always doing something correct and reasonable. It's
| often confidently incorrect too! I wouldn't give an intern free
| reign in my shell either!
| joshwarwick15 wrote:
| Permissoning: OAuth Data protection: Local LLMs
| weinzierl wrote:
| If you do not enable "Web Search" are you guaranteed it does not
| access the web anyway?
|
| Sometimes I want a pure model answer and I used to use Claude for
| that. For research tasks I preferred ChatGPT, but I found that
| you cannot reliably deny it web access. If you are asking it a
| research question, I am pretty sure it uses web search, even when
| _" Search"_ and _" Deep Research"_ are off.
| rafram wrote:
| Oh no, remote MCP servers. Security was nice while it lasted!
| rvz wrote:
| This is a fantastic time to get into the security space and
| trick all these LLMs into leaking sensitive data and make a lot
| of money out of that.
|
| MCP is a flawed spec and quite frankly a scam.
| knowaveragejoe wrote:
| What makes a remotely hosted MCP server less secure? The
| alternative, and what most of MCP consists of at the moment, is
| essentially running arbitrary code on your machine, as your
| user, and hooking this up to an LLM.
| rvz wrote:
| Can't wait for the first security incident relating to the
| fundamentally flawed MCP specification which an LLM will
| inadvertently be tricked to leak sensitive data.
|
| Increasing the amount of "connections" to the LLM increases the
| risk in a leak and it gives your more rope to hang yourself with
| when at least one connection becomes problematic.
|
| Now is a _great_ time to be a LLM security consultant.
| dimgl wrote:
| This is great, but can you fix Claude 3.7 and make it more like
| 3.5? I'm seriously disappointed with 3.7. It seems to be
| performing significantly worse for me on all tasks.
|
| Even my wife, who normally used Claude to create interesting
| recipes to bake cookies, has noticed a huge downgrade in 3.7.
| t0lo wrote:
| 3.7 seems to be way more filler and ambiguity and less insights
| for me.
| bjornsing wrote:
| The strategic business dynamic here is very interesting. We used
| to have "GPT-wrapper SaaS". I guess what we're about to see now
| is the opposite: "SaaS/MCP-wrapper GPTs".
| jimbokun wrote:
| The GPT wrappers were always going to be subsumed by
| improvements to the models themselves.
|
| LLMs wrapping the services makes more sense, as the data stored
| in those services adds a lot of value to off the shelf LLMs.
| bjornsing wrote:
| I think I agree. There's a lot of utility in a single LLM
| that can talk to many SaaS and integrate them. Feels like a
| better path forward than a separate LLM inside every SaaS.
| hdjjhhvvhga wrote:
| The people who connect a LLM to their Paypal and CLoudflare
| accounts perfectly deserve the consequences, both positive and
| negative.
| conroy wrote:
| Remote MCP servers are still in a strange space. Anthropic
| updated the MCP spec about a month ago with a new Streamable HTTP
| transport, but it doesn't appear that Claude supports that
| transport yet.
|
| When I hooked up our remote MCP server, Claude sends a GET
| request to the endpoint. According to the spec, clients that want
| to support both transports should first attempt to POST an
| InitializeRequest to the server URL. If that returns a 4xx, it
| should then assume the SSE integration.
| gonzan wrote:
| So there are going to be companies built on just an MCP server I
| guess, wonder what the first big one will be, just a matter of
| time I think
| worldsayshi wrote:
| Is it just me that would like to see more of confirmations before
| making opaque changes to remote systems?
|
| I might not dare to add an integration if it can potentially add
| a bunch of stuff to the backing systems without my approval.
| Confirmations and review should be part of the protocol.
| sepositus wrote:
| Yeah, this was my first thought. I was watching the video of it
| creating all of these Jira tickets just thinking in my head: "I
| hope it just did all that correctly." I think the level of
| patience with my team would be very low if I started running an
| LLM that accidentally deleted a bunch of really important
| tickets.
| worldsayshi wrote:
| Yeah. Feels like it's breaking some fundamental UX principle.
| If an action is going to make any significant change make
| sure that it fulfills _at least_ one of these:
|
| 1. Can be rollbacked/undone
|
| 2. Clearly states exactly what it's going to do in a
| reviewable way
|
| If those aren't fulfilled you're going to end up with users
| that are afraid of using your app.
| todsacerdoti wrote:
| Check out 2500+ MCP servers at https://mcp.pipedream.com
| the_clarence wrote:
| Been playing with MCP in the last few days and it's basically a
| more streamlined way to define tools/function calls.
|
| That + the agent SDK of openAI makes creating agentic flow so
| easy.
|
| On the other hand you're kinda forced to run these tools / MCP
| servers in their own process which makes no sense to me.
| nilslice wrote:
| you might like mcp.run, a tool management platform we're
| working on... totally agree running a process per tool, with
| all kinds of permissions is nonsensical - and the move to
| "remote MCP" is a good one!
|
| but, we're taking it a step (or two) further, enabling you to
| dynamically build up a MCP server from other servers managed in
| your account with us.
|
| try it out, or let me get you a demo! this goes for any casual
| comment readers too ;)
|
| https://cal.com/team/dylibso/mcp.run-demo
| the_clarence wrote:
| I meant I wanted to run them synchronously in the same
| process :)
| nilslice wrote:
| that's what this does :)
|
| you bundle mcp servers into a profile, which acts as a
| single virtual mcp server and can be dynamically updated
| without re-configuring your mcp client (e.g. claude)
| kostas_f wrote:
| Anthropic's strategy seems to go towards "AI as universal glue".
| They want to tie Claude into all the tools teams already live in
| (Jira, Confluence, Zapier, etc.). That's a smart move for
| enterprise adoption, but it also feels like they're compensating
| for a plateau in core model capabilities.
|
| Both OpenAI and Google continue to push the frontier on
| reasoning, multimodality, and efficiency whereas Claude's recent
| releases have felt more iterative. I'd love to see Anthropic push
| into model research again.
| bl4ckneon wrote:
| I am sure they are already doing that. To think that an AI
| researcher is doing essentially api integration work is a bit
| silly. Multiple efforts can happen at the same time
| kostas_f wrote:
| They certainly have internal research efforts underway, but
| I'm talking about what's actually been released to end users
| via the Claude app or API. Their latest public Sonnet release
| 3.7 (feb 2025) felt pretty incremental compared to Sonnet 3.5
| (june 2024), especially when you compare them to OpenAI and
| Google released models. In terms of the models you can
| integrate today, Anthropic hasn't quite kept pace on either
| reasoning performance or cost efficiency.
| freewizard wrote:
| I would expect Slack do this. Maybe Slack and Claude should
| merge one day, given MS and Google has their own core models.
| tjsk wrote:
| Slack is owned by Salesforce which is doing its own
| Agentforce stuff
| spacebanana7 wrote:
| Salesforce loves acquisitions. I can already picture
| Benioff's victory speech on CNBC.
| kostas_f wrote:
| Anthropic is now too expensive to be acquired. Only Amazon
| could be a potential buyer, given that out of the 3 big cloud
| providers, it's the only one without their own model
| offering.
| deanc wrote:
| I find it absolutely astonishing that Atlassian hasn't yet
| provided an LLM for confluence instances and instead a third
| party is required. The sheer scale of documentation and
| information I've seen at some organisations I've worked with is
| overwhelming. This would be a killer feature. I do not recommend
| confluence to my clients simply because the search is so
| appalling .
|
| Keyword search is such a naive approach to information discovery
| and information sharing - and renders confluence in big orgs
| useless. Being able to discuss and ask questions is a more
| natural way of unpacking problems.
| artur_makly wrote:
| on their announcement page they wrote " In addition to these
| updates, we're making WEB SEARCH available globally for all
| Claude users on paid plans."
|
| So I tested a basic prompt:
|
| 1. go to : SOME URL
|
| 2. copy all the content found VERBATIM, and show me all that
| content as markdown here.
|
| Result : it FAILED miserably with a few basic html pages - it
| simply is not loading all the page content in its internal
| browser.
|
| What worked well: - Gemini 2.5Pro (Experimental) - GPT 4o-mini //
| - Gemini 2.0 Flash ( not verbatim but summarized )
| meander_water wrote:
| Looks like this is possible due to the relatively recent addition
| of OAuth2.1 to the MCP spec [0] to allow secure comms to remote
| servers.
|
| However, there's a major concern that server hosters are on the
| hook to implement authorization. Ongoing discussion here [1].
|
| [0] https://modelcontextprotocol.io/specification/2025-03-26
|
| [1]
| https://github.com/modelcontextprotocol/modelcontextprotocol...
| dmarble wrote:
| Direct link to the spec page on authorization:
| https://modelcontextprotocol.io/specification/2025-03-26/bas...
|
| Source:
| https://github.com/modelcontextprotocol/modelcontextprotocol...
| marifjeren wrote:
| That github issue is closed but:
|
| > major concern that server hosters are on the hook to
| implement authorization
|
| Doesn't it make perfect sense for server hosters to implement
| that? If Claude wants access to my Jira instance on my behalf,
| and Jira hosts a remote MCP server that aids in exposing the
| resources I own, isn't it obvious Jira should be responsible
| for authorization?
|
| How else would they do it?
| cruffle_duffle wrote:
| The authorization server and resource server can be separate
| entities. Meaning that jira instance can validate the token
| but not be the one issuing it or handling credentials.
| marifjeren wrote:
| Yes, this is true of OAuth, which is exactly what the
| latest Model context protocol is using.. What's the concern
| again?
|
| I guess maybe you are saying the onus is NOT on the MCP
| server but on the authorization server.
|
| Anyway while technically true this is mostly just
| distracting because:
|
| 1. in my experience the resource server and the
| authorization server are almost always maintained by the
| same company -- Jira/Atlassian being an example
|
| 2. the resource server still minimally has the
| responsibility of identifying and integrating with some
| authorization server, and *someone* has to be the
| authorization server, so I'm not sure deferring the
| responsibility to that unidentified party is a strong
| defense against the critique anyway. The strong defense is:
| of course the MCP server should have these
| responsibilities.
| meander_water wrote:
| I think the pain points will be mostly for enterprise
| customers who want to integrate servers into their auth
| systems.
|
| For example, say you have a JIRA self hosted instance
| with SSO to entra id. You can't just install an MCP
| server off the shelf because authZ and resources are
| tightly coupled and implementation specific. It would be
| much easier if the server only handled providing
| resources, and authZ was offloaded to a provider of your
| choosing.
| marifjeren wrote:
| I'm under the impression that what you described is
| exactly how the new model context protocol works, since
| it's using oauth and is therefore unaware of any of the
| authentication (eg SSO) details. Your authentication
| process could be done via carrier pigeon and Claude would
| be none the wiser.
| halter73 wrote:
| That github issue is closed because it's been mostly
| completed. As of https://github.com/modelcontextprotocol/mode
| lcontextprotocol..., the latest draft specification does not
| require the resource server to act as or poxy to the IdP. It
| just hasn't made its way to a ratified spec yet, but SDKs are
| already implementing the draft.
| bdd_pomerium wrote:
| This is very cool. Integrations look slick. Folks are
| understandably hyped--the potential for agents doing "deep
| research-style" work across broad data sources is real.
|
| But the thread's security concerns--permissions, data protection,
| trust--are dead on. There is also a major authN/Z gap, especially
| for orgs that want MCP to access internal tools, not just curated
| SaaS.
|
| Pushing complex auth logic (OAuth scopes, policy rules) into
| every MCP tool feels backwards.
|
| * Access-control sprawl. Each tool reinvents security. Audits get
| messy fast.
|
| * Static scopes vs. agent drift. Agents chain calls in ways no
| upfront scope list can predict. We need per-call, context checks.
|
| * Zero-Trust principles mismatch. Central policy enforcement is
| the point. Fragmenting it kills visibility and consistency.
|
| We already see the cost of fragmented auth: supply-chain hits and
| credential reuse blowing up multiple tenants. Agents only raise
| the stakes.
|
| I think a better path (and in one in full disclosure, we're
| actively working on at Pomerium ) is to have:
|
| * One single access point in front of all MCP resources.
|
| * Single sign-on once, then short-lived signed claims flow
| downstream..
|
| * AuthN separated from AuthZ with a centralized policy engine
| that evaluates every request, deny-by-default. Evaluation in both
| directions with hooks for DLP.
|
| * Unified management, telemetry, audit log and policy surface.
|
| I'm really excited about what MCP is putting us in the direction
| of being able to do with agents.
|
| But without a higher level way to secure and manage the access,
| I'm afraid we'll spend years patching holes tool by tool.
| tkgally wrote:
| For the past couple of months, I've been running occasional side-
| by-side tests of the deep research products from OpenAI, Google,
| Perplexity, DeepSeek, and others. Ever since Google upgraded its
| deep research model to Gemini 2.5 Pro Experimental, it has been
| the best for the tasks I give them, followed closely by OpenAI.
| The others were far behind.
|
| I ran two of the same prompts just now through Anthropic's new
| Advanced Research. The results for it and for ChatGPT and Gemini
| appear below. Opinions might vary, but for my purposes Gemini is
| still the best. Claude's responses were too short and simple and
| they didn't follow the prompt as closely as I would have liked.
|
| Writing conventions in Japanese and English
|
| https://claude.ai/public/artifacts/c883a9a5-7069-419b-808d-0...
|
| https://docs.google.com/document/d/1V8Ae7xCkPNykhbfZuJnPtCMH...
|
| https://chatgpt.com/share/680da37d-17e4-8011-b331-6d4f3f5ca7...
|
| Overview of an industry in Japan
|
| https://claude.ai/public/artifacts/ba88d1cb-57a0-4444-8668-e...
|
| https://docs.google.com/document/d/1j1O-8bFP_M-vqJpCzDeBLJa3...
|
| https://chatgpt.com/share/680da9b4-8b38-8011-8fb4-3d0a4ddcf7...
|
| The second task, by the way, is just a hypothetical case. Though
| I have worked as a translator in Japan for many years, I am not
| the person described in the prompt.
| noisy_boy wrote:
| What is the best stack/platform to get started with MCP? I'm
| talking in terms of ergonomics, features and popularity.
| jngiam1 wrote:
| This is awesome. We implemented a MCP client that's fully
| compatible with the new remote MCP specs, support OAuth and all.
| It's really smooth and I think paves the way for AI to work with
| tools. https://lutra.ai/mcp
| jes5199 wrote:
| the MCP spec as it stands today is pretty half-baked. It's pretty
| clear that the first edition was trying to emulate STDIO over
| HTTP, but that meant holding open a connection indefinitely. The
| new revision tries to solve this by letting you hold open as many
| connections as you want! but that makes it vague about message
| delivery ordering when you have multiple streams open. There even
| seems to be part of the spec that is logically impossible -
| people are wrestling with it in the GitHub issues.
|
| which is to say: I'm not sure it actually wins, technically, over
| the OpenAI/OpenAPI idea from last year, which was at least easy
| to understand
| sagarpatil wrote:
| Should have just called it Remote MCP. Integrations sounds very
| vague.
| Surac wrote:
| I often use Claude 3.7 on programming things never done before.
| Even extensive search in the web brings up zero hits. I
| understand that this is very uncommon but my work portfolio is
| more science than real programming. Claude 3.7 really ,,thinks"
| about the questions i ask. But 3.5 regularly drifts into dream
| mode if asked anything over it's training data. So if you ask for
| code easy found on the web you will see no difference. Try asking
| things not so common and you will see a difference
| myflash13 wrote:
| Finally I can do something simple that I've wanted to do for
| ages: paste in a poster image or description of an event and tell
| the AI to add it to my calendar.
| gjohnhazel wrote:
| I just have it create an .ics file and open that
| franze wrote:
| > Integrations and advanced Research are now available in beta on
| the Max, Team, and Enterprise plans, and will soon be available
| on Pro.
| MarkMarine wrote:
| The plaid integration is to let you look at your install? I was
| excited to see all my accounts (as a consumer) knit together and
| reported on by Claude. Bummer
| sebstefan wrote:
| An AI that is capable of responding to a "How do I do X" prompt
| with "Hey this seems related to a ticket that was already opened
| on your Jira 2 months ago", or "There is a document about this in
| Sharepoint", it would bring me such immense value, I think I
| might cry.
|
| Edit: Actually right in the tickets themselves would probably be
| better and not require MCP... but still
| MagicMoonlight wrote:
| Copilot can already be setup to use sharepoint etc. And you can
| set it up to only respond based on internal content.
|
| So if you ask it "who is in charge of marketing" it will read
| it off sharepoint instead of answering generically
| elia_42 wrote:
| Very interesting. The integration videos are great to start right
| away and try out the new features. The extensions of the deep
| reasoning capabilities are also incredible.
|
| I think we are coming to a new automated technology ecosystem
| where LLMs will orchestrate many different parts of software with
| each other, speeding up the launch, evolution and monitoring of
| products.
| abhisek wrote:
| Looks to me another apps ecosystem coming up similar to Android
| or iPhone. We are probably going to see a lot of AI apps
| marketplaces that solve the problem of discovery, billing &
| integration with AI hosts like Claude Desktop.
| game_the0ry wrote:
| Its only a matter of time where folks write user stories and an
| LLM takes over for the first draft, then iterate from there.
|
| Btw, that speaks to how important it is to get clear business
| requirements for work.
| clintonb wrote:
| Greptile (https://www.greptile.com/) tries to do that, at least
| for bug tickets. I recall being annoyed by its suggestions
| (posted as Linear comments).
| dakshgupta wrote:
| Co-founder of Greptile - that was a bad feature that we since
| deprecated to focus entirely on AI code reviews
| ausbah wrote:
| usability like this seems to be a big nail through oss llm usage
___________________________________________________________________
(page generated 2025-05-02 23:02 UTC)