[HN Gopher] Amazon to invest another $4B in Anthropic
___________________________________________________________________
Amazon to invest another $4B in Anthropic
Author : swyx
Score : 636 points
Date : 2024-11-22 16:25 UTC (1 days ago)
(HTM) web link (www.cnbc.com)
(TXT) w3m dump (www.cnbc.com)
| aliasxneo wrote:
| > Amazon Web Services will also become Anthropic's "primary cloud
| and training partner," according to a blog post. From now on,
| Anthropic will use AWS Trainium and Inferentia chips to train and
| deploy its largest AI models.
|
| I suspect that's worth more than $4B in the long term? I'm not
| familiar with the costs, though.
| devjab wrote:
| I've been impressed with the AI assisted tooling for the
| various monitoring systems in Azure at least. Of course this is
| mainly because those tools are so ridiculously hard to use that
| I basically can't for a lot of things. The AI does it
| impressively well though.
|
| I'd assume there is a big benefit to having AI assisted
| resource generation for cloud vendors. Our developers often
| have to mess around with things that we really, really,
| shouldn't in Azure because operations lacks the resources and
| knowledge. Technically we've outsourced it, but most requests
| take 3 months and get done wrong... if an AI could generate our
| network settings from a global policy that would be excellent.
| Hell if it could handle all our resource generation they would
| be so much useless time wasted because our organisation views
| "IT" as HRs uncharming cost center cousin.
| senderista wrote:
| Inferentia...Bollocks
|
| Sorry.
| liquidise wrote:
| Can someone with familiarity in rounds close to this size speak
| to their terms?
|
| For instance: i imagine a significant part of this will be "paid"
| as AWS credits and is not going to be reflected as a balance in a
| bank account transfer.
| uptownfunk wrote:
| Yes, that is the case. It is largely 4B in capex investment,
| I'd imagine 10% or less is cash. One would think nvidia could
| get much better terms investing its gpu (assuming they can get
| it into a working cluster). Instead it's nvidia gets cash for
| gpu hardware, that hardware gets put into a data center and AWS
| invests their hardware as credits for equity instead of cash.
| And because AWS has already built out their data center infra
| they can get a better deal than nvidia making the play because
| nvidia has to rebuild an entire data center infra from scratch
| (in addition to designing gpu etc).
|
| Now if AWS or gcp can crack gpu compute better than nvidia for
| training and hosting, then they can basically cut out nvidia
| and so essentially they get gpu at cost (vs whatever markup
| they pay to nvidia).
|
| Because essentially whatever return AWS will make from
| Anthropic will be modulated by the premiums paid to nvidia to
| invest and also the cost of operating a data center for
| Anthropic.
|
| But thankfully all of that gets mediated on paper because
| valuation is more speculative than the returns on nvidia
| hardware (which will be known to the cent by AWS given its some
| math of hourly rate and utilization which they have a good idea
| of)
| maxclark wrote:
| Is this really a $4B investment, or credits on AWS?
|
| AWS margins are close to 40%, so the real cost of this
| "investment" would be way less than the press release.
| swyx wrote:
| https://techcrunch.com/2024/11/22/anthropic-raises-an-additi...
|
| > "This new CASH infusion brings Amazon's total investment in
| Anthropic to $8 billion while maintaining the tech giant's
| position as a minority investor, Anthropic said."
| mistrial9 wrote:
| ok but how much cash, really.. looks ambiguous.
|
| ps- plenty of people turning a blind eye towards rampant
| valuation inflation and "big words" statements on deals.
| Where is the grounding on the same dollars that are used at a
| grocery store? The whole thing is fodder for instability in a
| big way IMHO
| Etheryte wrote:
| I don't really see any ambiguity? If the reporting is
| accurate, the whole $4B is cash.
| mef wrote:
| whether cash or credit, it's all going right back to AWS
| hehehheh wrote:
| This is great for creative accounting. AWS now has 4bn in
| equity and 4bn in additional sales.
| lucianbr wrote:
| So it's a $2.4B investment, announced as $4B.
|
| Significantly less, still a huge investment.
|
| I look forward to the moment the sunk cost fallacy shows up.
| "We've invested $20B into this, and nothing yet. Shall we
| invest $4B more? Maybe it will actually return something this
| time." That will be fun.
| hehehheh wrote:
| It could be the anthropic models makes bedrock attractive and
| profitable and more importantly medium term competitive
| against azure. It seems worth it.
| mikeocool wrote:
| Curious if anyone knows the logistics of these cloud provider/AI
| company deals. In this case, it seems like the terms of the deal
| mean that Anthropic ends up spending most of the investment on
| AWS to pay for training.
|
| Does anthropic basically get at cost pricing on AWS? If Amazon
| has any margin on their pricing, it seems like this $4B
| investment ends up costing them a lot less, and this is a nice
| way to turn a cap ex investment into AWS revenue.
| aiinnyc wrote:
| One hand washes the other.
| tyre wrote:
| Yes exactly.
|
| This was the brilliance of the original MSFT investment into
| OpenAI. It was an investment in Azure scaling its AI training
| infra, but roundabout through a massive customer (exactly what
| you'd want as a design partner) and getting equity.
|
| I'm sure Anthropic negotiated a great deal on their largest
| cost center, while Amazon gets a huge customer to build out
| their system with.
| whatshisface wrote:
| This explanation makes no sense, I could be AWS' biggest
| customer if they wanted to pay me for it. Something a little
| closer could be that the big tech companies wanted to acquire
| outside LLMs, not quite realizing that spending $1B on
| training only puts you $1B ahead.
| raverbashing wrote:
| Yes but Amazon is not making extra money with you being
| their biggest customer
|
| With Anthropic yes
| whatshisface wrote:
| Anthropic is getting $4B in investment in a year where
| their revenue was about $850M. Even if Amazon had bought
| them outright for that much, they would not be ahead. The
| fact that everybody keeps repeating the claim that Amazon
| is "making money" makes this appear like some kind of
| scam.
| surgical_fire wrote:
| It appears to be a scam because it sort of is.
|
| AI needs to be propped up because the bug tech cloud
| providers they depend on need AI to be a thing to justify
| their valuations. Tech is going through a bit of a slump
| where all things being hyped a few years ago sort of died
| down (crypto? VR? Voice assistants? Metaverse?). Nobody
| gets very hyped about any of those nowadays. I am
| probably forgetting a couple of hyped things that fizzled
| out over the years.
|
| Case in point, as much as I despise Apple, they are not
| all-in the AI bandwagon because it does nothing for them.
| vineyardmike wrote:
| Go look at earnings reports for big tech companies. AI is
| definitely driving incremental revenue.
|
| Apple is definitely on the AI bandwagon, they just have a
| different business model and they're very disciplined.
| Apple tends not to increase research and investment costs
| faster than revenue growth. You'll also notice rumors
| that they're lowering their self driving car and VR
| research goals.
| surgical_fire wrote:
| > Go look at earnings reports for big tech companies. AI
| is definitely driving incremental revenue.
|
| Yes. Which proves my point.
| vineyardmike wrote:
| Google Cloud revenue up 35% thanks to AI products
| [1,4,5]. Azure sales by a similar amount (but only 12%
| was AI products [2]. AWS is up too [3].
|
| In so glad your point was that it's not a scam, and there
| are billions of dollars in real sales occurring at a
| variety of companies. It's amazing what publicly traded
| companies disclose if we only bother to read it. I'm glad
| we're all not in the contrarian bubble where we have to
| hate anything with hype.
|
| 1. https://technologymagazine.com/articles/how-ai-surged-
| google...
|
| 2. https://siliconangle.com/2024/10/30/microsofts-ai-bet-
| pays-o...
|
| 3. https://www.ciodive.com/news/AWS-cloud-revenue-growth-
| AI-dem...
|
| 4. https://www.reuters.com/technology/google-parent-
| alphabet-be...
|
| 5. https://fortune.com/2024/10/29/google-q3-earnings-
| alphabet-s...
| surgical_fire wrote:
| > In so glad your point was that it's not a scam
|
| Except it sort of is. It needs AI to be hyped and propped
| up, so that all those silly companies spending in GCP can
| continue to do so for a wee bit longer.
| dartos wrote:
| I don't know if that makes it a scam.
|
| I think you're putting the cart before the horse.
|
| Big cloud providers will push anything that would make
| them money. That's just what marketing is.
|
| AI was exciting long before big cloud providers even
| existed. Once it was clear that a product could be made,
| they started marketing it and selling the compute needed.
|
| What's the scam?
| surgical_fire wrote:
| Crypto was exciting too. And metaverse. And VR. And voice
| assistants. Et cetera and so forth.
|
| All those things would change the world, and nothing
| would ever be the same, and would disrupt everything.
| Except they wouldn't and they didn't.
|
| The scam is that those companies don't want to be seen as
| mature companies, they need to justify valuations of
| growth companies, forever. So something must always go
| into the hype pyre.
|
| By all means, I hope the scam goes on for longer, as it
| indirectly benefits me too. But I don't have it in my
| heart to be a hypocrite. I will call a pig a pig.
| dartos wrote:
| I think AI isn't the same as crypto or metaverse.
|
| The LLMs and image generation models have obvious
| utility. They're not AGI or anything wild like that, but
| they are legitimately useful, unlike crypto.
|
| VR didn't fail, it just wasn't viral. Current VR
| platforms are still young. The internet commercially
| failed in 2001, but look at it now.
|
| Crypto the industry, imo, is a big pyramid scheme. The
| technology has some interesting properties, but the
| industry is scammy for sure.
|
| Metaverse wasn't even an industry, it was a buzzword for
| MMOs during a time when everyone was locked at home. Not
| really interesting.
|
| I don't think it's wise to lump every market boom
| together. Not everything is a scam.
| fakedang wrote:
| People are losing jobs because of AI. Like it or not, as
| imperfect as AI may be, AI is having a real world
| disruptive impact, however negative it may be. Customer
| service teams and call centers are already being affected
| by AI, and if they aren't being smart about it, being
| rendered obsolete.
|
| A lot of folks here seem to look at AI through examples
| of YC companies apparently. Step back and look instead at
| the kind of projects technology consultancies are taking
| up instead - they are real world examples of AI
| applications, many of which don't even involve LLMs but
| other aspects such as TTS/STT, image generation,
| transcription, video editing, etc. Way too many
| freelancers have begun complaining about how their
| pipelines have been zilch in the past two years.
| dartos wrote:
| There are also a lot of macroeconomic changes making
| hiring contractors (or anyone, really) a less attractive
| option at least in the US.
| surgical_fire wrote:
| That was, perhaps, the only good retort made so far. Yes,
| call centers and customer service is being affected,
| although it is unclear to me if the cost-benefit make
| sense when AI stops being heavily subsidized - I may be
| wrong, but my impression is that AI companies bleed money
| not only with training, but in running the models, and
| the actual cost of those services for it to make sense
| will need to be substantially higher than they are right
| now.
| MVissers wrote:
| Price dropping is just a matter of time. Compute gets
| cheaper and the models get better. We've seen 100x drop
| in price for same capabilities in ~2 years.
|
| Don't forget about writers and designers losing jobs as
| well. If you're not absolute top and don't use AI, AI
| will replace you.
| jacobsimon wrote:
| I think the implication of the top comment is that cloud
| providers are buying revenue. When we say that cloud
| provider revenue is "up due to AI", a large part of that
| growth may be their own money coming back to them through
| these investments. Nvidia has been doing the same thing,
| by loaning data centers money to buy their chips.
| Essentially these companies are loaning each other huge
| sums of money and representing the resulting income as
| revenue rather than loan repayments.
|
| To be clear, it's not to say that AI itself is a scam,
| but that the finance departments are kind of
| misrepresenting the revenue on their balance sheets and
| that may be security fraud.
| herval wrote:
| > Case in point, as much as I despise Apple, they are not
| all-in the AI bandwagon because it does nothing for them.
|
| not sure if you've been paying attention, but AI is
| literally _the only thing_ Apple talks about these days.
| They literally released _an entire generation of devices_
| where the only new thing is "Apple Intelligence"
| staticman2 wrote:
| Is Apple investing in AI as much as Google, Meta,
| Microsoft, and xAI? If not they are not "all in".
| herval wrote:
| They don't disclose it, but I'd imagine so. They also
| admit to being a couple of years late, so they're
| accelerating (as per their last earnings call)
| adgjlsfhk1 wrote:
| they are investing differently. Apple has a much more
| captive audience than the others, and as such is focused
| on AI services that can be run on device. as such, they
| aren't doing the blessing edge foundation modern
| research, but instead putting a ton of money into
| productionizing smaller models that can be run without
| giant cloud compute.
| herval wrote:
| Trivia: not sure if you're aware, but there's billion
| dollar companies in all these spaces you claim "nobody
| cares about". Every single stock broker in the US trades
| crypto now. Omniverse earns Nvidia a ton of money, Apple
| earned a billion dollars with a clunky v1 and Meta is
| selling more and more Quests every half.
| surgical_fire wrote:
| Yeah. Was not really a world changer as it was claimed to
| be during hype cycle.
|
| Billion dollar valuation for a conpany in a given space
| is not as impressive as you think it is. Do I need to
| mention some high profile companies with stellar
| valuations that are sort of a joke now? We can work
| together on this ;)
| asadotzler wrote:
| Apple has spent over $10B on AVP and made back less than
| 10% of that with no signs of improvement in the next year
| or two and continued big spending on dev and content.
|
| Meta has spent over $50B on Quest and the Metaverse with
| fewer than 10M MAU to show for it.
|
| If you think those are successes, I'll go out and get
| several bridges to sell you. Meet me here tomorrow with
| cash.
| vineyardmike wrote:
| These sort of investments usually also contain licensing
| deals.
|
| Amazon probably gets Anthropic models they can resell
| "for free". The 850M revenue is Anthropic's, but there is
| incremental additional revenue to AWS's hosted model
| services. AWS was already doing lots of things with
| Anthropic models, and this may alter the terms more in
| amazons favor.
|
| Are they actually _making money_? I don't know,
| investments aren't usually profitable on day one. Is this
| an opportunity for more AWS revenue in the future?
| Probably.
| celestialcheese wrote:
| And access to use anthropics models internally, where you
| have some guarantees and oversight that your corp and
| customer data aren't leaking where you don't want it to.
| mbesto wrote:
| This is not how it works.
|
| First, revenue is irrelevant.
|
| Second, the investment isn't a loan that they need to
| repay. They are getting equity.
|
| Third, Anthropic is exclusively using AWS to train its
| models. Which, yes, means if AWS gives them $4B and it
| costs them $500M/year to pay for AWS services then after
| 8 years, the cash is a wash. However this ignores the
| second point.
|
| Fourth, there is brand association for someone who wanted
| to run their own single tenant instance of Claude whereby
| you would say "well they train Claude on AWS, so that
| must be the best place to run it for our <insert
| Enterprise org>" similar to OpenAI on Azure.
|
| Fifth, raising money is a signaling exercise to larger
| markets who want to know "will this company exist in 5
| years?"
|
| Sixth, AWS doesn't have its own LLM (relative to Meta,
| MS, etc.). The market will associate Claude with Amazon
| now.
| warkdarrior wrote:
| > Sixth, AWS doesn't have its own LLM (relative to Meta,
| MS, etc.). The market will associate Claude with Amazon
| now.
|
| Amazon/AWS has their line of Titan LLMs:
| https://aws.amazon.com/bedrock/titan/
| mbesto wrote:
| Fair. I wasn't aware of that, for the same reason that if
| you search Titan vs Claude on HN, you'll find way more
| mentions of Claude:
|
| https://hn.algolia.com/?dateRange=all&page=0&prefix=true&
| que...
|
| https://hn.algolia.com/?dateRange=all&page=0&prefix=true&
| que...
|
| I think its fair to say this is also a hedging strategy
| then.
| whatshisface wrote:
| The difference between things you'd say like "it's true
| that..." and "the market will associate..." basically is
| the definition of a scam.
| mbesto wrote:
| Ummm okay? A scam implies someone is getting hurt
| (financially, emotionally, etc.). Who's getting scammed
| here?
| whatshisface wrote:
| The big tech companies are spending enormous amounts for
| part ownership in startups whose only assets are
| knowledge that exists in the public domain, systems that
| the companies could have engineered themselves, and model
| weights trained with the buyer's own capital. The people
| who will get hurt are public investors who are having
| their investment used to make a few startup people really
| rich.
| prewett wrote:
| > whose only assets are knowledge
|
| Knowledge is quite the useful asset, and not easily
| obtained. People obtain knowledge by studying for years
| and years, and even then, one might obtain information
| rather than knowledge, or have some incorrect knowledge.
| The AI companies have engineered a system that (by your
| argument) distills knowledge from artifacts (books,
| blogs, etc.) that contain statements, filler, opinions,
| facts, misleading arguments, incorrect arguments, as well
| as knowledge and perhaps even wisdom. Apparently this
| takes hundreds of millions of dollars (at least) to do
| for one model. But, assuming they actually _have_
| distilled out knowledge, that would be valuable.
|
| Although, since the barrier to entry is pretty low, they
| should not expect sustained high profits. (The barrier is
| costly, but so is the barrier to entry to new airlines--a
| few planes cost as much as an AI model--yet new airlines
| start up regularly and nobody really makes much profit.
| Hence, I conclude that requiring a large amount of money
| is not necessarily a barrier to entry.)
|
| (Also, I argue that they have _not_ actually distilled
| out knowledge, they have merely created a system that is
| about as good at word association as the average human.
| This is not knowledge, although it may have its own
| uses.)
| PittleyDunkin wrote:
| If the scam only hurts investors i'd say it's likely a
| net benefit to humanity.
| kelnos wrote:
| If they could build it themselves, why haven't they? Say
| what you want about Amazon, but I find it hard to believe
| that Anthropic bamboozled them into believing they can't
| build their own AI when they could do it cheaper.
| Spooky23 wrote:
| It's not a scam at all. Amazon doesn't have an AI story.
| So they invest in Anthropic, get a lot of that money back
| as revenue that seeds demand.
|
| Their customers now have an incentive to do AI in AWS.
| That drives more revenue for AWS.
| donavanm wrote:
| > Amazon doesn't have an AI story.
|
| A quibble: AWS _does_ have an AI story (which i was
| originally dismissive of): Bedrock as a common interface
| and platform to access your model of choice, plus
| niceties for fine tuning/embeddings/customization etc.
| Unlike say Azure theyre not betting on _a_
| implementation. Theyre betting that competition/results
| between models will trend towards parity with limited
| fundamental differentiation. Its a bet on enterprises
| wanting the _functionality_ more generally and being able
| to ramp up that usage via AWS spend.
|
| WRT titan my view is that its 1) production r&d to stay
| "in the game" 2) a path towards commoditization and lower
| structural costs, which companies will need if these
| capabilities are going to stick/have roi in low cost
| transactions.
| Spooky23 wrote:
| Sure they do, but it doesn't have a ton of traction
| relative to the size of AWS.
| throwup238 wrote:
| Last I checked, AWS _reserve_ pricing for one year of an
| 8x H100 pod costs more than just buying the pod yourself
| (with tens of thousands left over per server for the
| NVIDIA enterprise license and to hire people to manage
| them). On demand pricing is even worse.
|
| This is essentially money that they would have spent to
| build out their cloud anyway, except now they also get
| equity in Anthropic. Whether or not Anthropic survives,
| AWS gets to keep all of those expensive GPUs and sell
| them to other customers so their medium/long term
| opportunity cost is small. Even if the deal includes
| cheaper rates the hardware still amortizes over 2-3
| years, and cloud providers are running plenty of 5+ year
| old GPUs so there's lots of money to be made in the long
| tail (as long as ML demand keeps up).
|
| They're not making money yet because there's the $4
| billion opportunity cost, but even if their equity in
| Anthropic drops to zero, they're probably still going to
| make a profit on the deal. If the equity is worth
| something, they'll make significantly more money than
| they could have renting servers. Throw financial
| engineering on top of that, and they may come out far
| ahead regardless of what happens to Anthropic: Schedule K
| capital equipment amortizations are treated differently
| from investments and AFAICT they can double dip since
| Anthropic is going to spend most of it on cloud (IANAL).
| That's likely why this seems to be cash investment
| instead of in-kind credits.
|
| I think that's what people mean when they say Amazon is
| making money off the deal. It's not an all or nothing VC
| investment that requires a 2-3x exit to be profitable
| because the money just goes back to AWS's balance sheet.
| AgentOrange1234 wrote:
| Yes and it's also interesting that they mention using
| Trainium to do the training. I don't know how much spend
| that is, but it seems really interesting. Like, if you're
| AWS, and you imagine competing in the long run with
| NVIDIA for AI chips, you need to fund all that silicon
| development.
| axpy906 wrote:
| They mentioned that in the last investment too. That
| seems like marketing to me as no one is doing bleeding
| edge research outside of the NVIDIA CUDA run ecosystem.
| phillipcarter wrote:
| This is a way to keep the money printer called AWS
| Bedrock going and going and going. Don't underestimate
| the behemoth enterprises in the AWS rolodex who are all
| but assured to use that service for the next 5+ years at
| high volume.
| wepple wrote:
| Wow, I had no idea Anthropic was doing $850m revenue.
|
| I know they have high costs, but as a startup that's some
| phenomenal income and validation that they're not pure
| speculation like most startups are
|
| Edit: founded in 2021 and with 1000 employees. That's
| just wild growth.
| wcunning wrote:
| That's honestly one of the hardest things in engineering --
| identifying not just a customer to drive requirements, but a
| knowledgeable customer who can drive good requirements that
| work for a broader user base and can drive further expansion.
| Anthropic seems ideal for that, plus they act as a
| service/API provider on AWS.
| antupis wrote:
| Yeah this working with knowledgeable customer is like
| magic.
| dzonga wrote:
| or simply one of the best corporate buyouts that's not
| subject to regulatory scrutiny. microsoft owns 49% of OpenAI
| - will get profits till whenever. All without subject to
| regulatory approval. and they get to improve Azure
| rty32 wrote:
| A caveat -- FTC is currently looking into the deal between
| Microsoft and OpenAI.
| fny wrote:
| And Amazon can always build their own LLM product down the
| line. Building out data centers feels like a far more
| difficult problem.
| diggan wrote:
| > Building out data centers feels like a far more difficult
| problem.
|
| Is it really? I'm thinking it might be more time-and-money-
| involved than building a "LLM product" (guess you really
| meant models?), but in terms of experience, we (humanity)
| have decades of experience building data centers, while a
| few years (at most) experience regarding anything LLM.
| bittermandel wrote:
| I'd say building datacenters is a commodity these days.
| There's countless actors in this field who are thriving.
| ralgozino wrote:
| Absolutely, you can even buy a pre built datacenter from
| companies like Huawei or Schneider and get it shipped,
| plug power and network and be online.
| eitally wrote:
| I am not privy to specific details, but in general there is a
| difference between investment and partnership. If it's
| literally an investment, it can either be in cash or in kind,
| where in kind can be like what MSFT did for OpenAI, essentially
| giving them unlimited-ish ($10b) Azure credits for training ...
| but there was quid pro quo where MSFT in turn agreed to
| embed/extend OpenAI in Azure services.
|
| If it's a partnership investment, there may be both money & in-
| kind components, but the money won't be in the context of
| fractional ownership. Rather it would be partner development
| funds of various flavors, which are usually tied to consumption
| commits _as well as_ GTM targets.
|
| Sometimes in reading press releases or third party articles
| it's difficult to determine exactly what kind of relationship
| the ISV has with the CSP.
| chatmasta wrote:
| Supermicro is currently under DOJ investigation for similar
| schemes to this. The legality of it probably depends on the
| accounting, and how revenue is recognized, etc.
|
| It certainly _looks_ sketchy. But I'm sure there's a way to do
| it legitimately if their accountants and lawyers are careful
| about it...
| B4CKlash wrote:
| There's also another angle. During the call with Lex last week,
| Dario seemed to imply that future models would run on amazon
| chips from Annapurna Labs (Amazon's 2015 fabless purchase).
| Amazon is all about the flywheel + picks and shovels and I,
| personally, see this as the endgame. Create demand for your
| hardware to reduce the per unit cost and speed up the dev
| cycle. Add the AWS interplay and it's a money printing machine.
| shawndrost wrote:
| You can find the text of the original OpenAI/MSFT deal here:
| https://www.lesswrong.com/posts/5jjk4CDnj9tA7ugxr/openai-ema...
| DAGdug wrote:
| This assumes they have no constraint when it comes to supply,
| and therefore no opportunity cost.
| paulddraper wrote:
| Correct.
|
| Same with Microsoft.
| yard2010 wrote:
| ...isn't it tax fraud with extra steps? Asking seriously.
| scosman wrote:
| Also: they need top tier models for their Bedrock business.
| They are one of only a few providers for Claude 3.5 - it's not
| open and anthropic doesn't let many folks run it.
|
| Google has Gemini (and Claude), MSFT has OpenAI. Amazon needs
| this to stay relevant.
| dustingetz wrote:
| i believe they get to book any investment of cloud credits as
| revenue, here's a good thread explaining the grift:
| https://news.ycombinator.com/item?id=39456140 basically you're
| investing your own money in yourself which mostly nets out but
| you get to keep the equity (and then all the fool investors
| FOMO in on fake self dealing valuations)
| ipaddr wrote:
| I' m not sure how they make it back. The guardrails in place are
| extremely strict. The only people who seem to use it are a subset
| of developers who are unhappy with OpenAI. With Bard popping up
| free everywhere taking away much of the general user crowd and
| OpenAI offering the mini model always free and limited image
| generation / expensive model. Then you have to do it yourself
| crowd with llama. What is their target market? Governments?
| Amazon companies?There free their offers 10 queries and half of
| them need to be used to get around filters I don't see this
| positioned well for general customers.
| JamesBarney wrote:
| Claude api use is already as high as openai. I believe that
| market will grow far more over time than chat as AI gets
| embedded in more of the applications we already use.
| reubenmorais wrote:
| With Claude on Bedrock I can use LLMs in production without
| sending customer data to the US. And if you're already on AWS
| it's super easy to onboard wrt. auth and billing and
| compliance.
| maeil wrote:
| If you're using Bedrock you're still subject to the CLOUD
| act/FISA meaning the whole angle of "not sending customer
| data to the US" isn't worth very much.
| reubenmorais wrote:
| It's worth enough to customers to make a best effort.
| loandbehold wrote:
| Claude is the best model for programming. New generation of
| code tools like Cursor all use Claude as the main model.
| petesergeant wrote:
| > Claude is the best model for programming
|
| This week.
| GaggiX wrote:
| It has been for the last several months now.
| square_usual wrote:
| It has held this position since at least June. The Aider
| LLM leaderboards [1] have the Sonnet 3.5 June version
| beating 4o handily. Only o1-preview beat it narrowly, but
| IIRC at much higher costs. Sonnet 3.5 October has taken the
| lead again by a wide margin.
|
| 1: https://aider.chat/docs/leaderboards/
| iLemming wrote:
| Anecdotally, Claude seems to hallucinate more during
| certain hours. It's amusing to watch, almost like your dog
| that gets too bored and stops responding to your commands -
| you say "sit" and he looks at you, tilts his head, looks
| straight up at you, almost like saying "I know what you're
| saying..." but then decides to run to another room and
| bring his toy.
|
| And you'd be wondering: "darn, where's that toughest, most
| obidient and smart Belgian malinois that just a few hour
| ago was ready to take down a Bin Laden?"
| petesergeant wrote:
| Talking of anecdotal, 4o with canvas, which is normally
| excellent, tends to give up around a certain context
| length, and you have to copy and paste what you have into
| a new window to get it to make edits
| maeil wrote:
| This week, along with the 20 weeks before that :) Model
| improvement has slowed down so much that things aren't
| changing quickly anymore. And Anthropic has only widened
| the gap with 3.5-v2.
| staticman2 wrote:
| The Guardrails on Claude Sonnet 3.5 API are not stricter than
| Openai's guardrails in my experience. More specifically, if you
| access the models via API or third party services like Poe or
| Perplexity the guardrails are not stricter than GPT4o. I've
| never subscribed to Claude.ai so can't comment on that.
|
| I have no experience with Claud.ai vs ChatGPT but it's clear
| the underlying model has no issue with guardrails and this is
| simply an easily tweaked developer setting if you are correct
| that they are stricter on Claude.ai.
|
| (The old Claude 2.1 was hilariously unwilling to follow
| reasonable user instructions due to "ethics" but they've come a
| long way since then.)
| ipaddr wrote:
| My comment was purely about Claud.ai which is where general
| customers would go.
| staticman2 wrote:
| I don't know if Claude.ai or ChatGPT are even profitable at
| this stage, so they might not particularly want general
| customers.
| dragonwriter wrote:
| > The Guardrails on Claude Sonnet 3.5 API are not stricter
| than Openai's guardrails in my experience.
|
| Both Gemini and Claude (via the API) have substantially
| tighter guardrails around recitation (producing output
| matching data from their training set) than OpenAI, which I
| ran into when testing an image text-extraction-and-document-
| formatting toolchain against all three.
|
| Both Claude and Gemini gave refusals on text extraction from
| image documents (not available publicly anywhere I can find
| as text) from a CIA FOIA release
|
| Not sure if they are tighter in other areas.
| staticman2 wrote:
| I just asked GPT4o to recognize a cartoon character (I
| accessed it via Perplexity) and it told me it isn't able to
| do that, while Claude Sonnet happily identified the
| character, so this might vary by use case or even by
| prompt.
| msp26 wrote:
| I've had a situation where Claude (Sonnet 3.5) refused to
| translate song lyrics because of safety/copyright bullshit.
| It worked in a new chat where I mentioned that it was a pre
| 1900s poem.
| staticman2 wrote:
| I've translated hundreds of pages of novel text via
| Sonnet 3.5. But I did it where I have system prompt
| access and tell it to act as a translator.
| rwalle wrote:
| Have you had luck with Google's AI Studio with regard to
| text extraction?
| atsaloli wrote:
| I am in Operations. I use it (and pay for it) because the free
| version seemed to work best for me compared to Perplexity
| (which had been my go-to) and ChatGPT/OpenAI.
| hamburga wrote:
| Government alone could be huge, with this recent nonsense about
| the military funding a "Manhattan project for AI" and the
| recently announced Pentagon contracts.
| Deegy wrote:
| I mean, they might make back the $4b on the value it brings to
| programming alone.
| swyx wrote:
| related coverage
|
| - https://www.anthropic.com/news/anthropic-amazon-trainium
|
| - https://www.aboutamazon.com/news/aws/amazon-invests-addition...
|
| - https://techcrunch.com/2024/11/22/anthropic-raises-an-additi...
| OceanBreeze77 wrote:
| What's the difference between trainium and the AWS bedrock
| offering?
| newfocogi wrote:
| AWS Trainium is a machine learning chip designed by AWS to
| accelerate training deep learning models. AWS Bedrock is a
| fully managed service that allows developers to build and
| scale generative AI applications using foundation models from
| various providers.
|
| Trainium == Silicon (looks like Anthropic has agreed to use
| it)
|
| Bedrock == AWS Service for LLMs behind APIs (you can use
| Anthropic models through AWS here)
| jatins wrote:
| Anthropic gets a lot of it's business via AWS Bedrock so it's
| fair to say that Amazon probably has reasonable insight into how
| the Claude usage is growing that makes them confident in this
| investment
| swyx wrote:
| > gets a lot of it's business via AWS Bedrock
|
| can you quantify? any numbers, even guesstimates?
| mediaman wrote:
| One source [1] puts it at 60-75% of revenue as third-party
| API, most of which is AWS.
|
| [1]https://www.tanayj.com/p/openai-and-anthropic-revenue-
| breakd...
| paxys wrote:
| They are also confident in the investment because they know
| that all the money is going to come right back to them in the
| short term (via AWS spending) whether or not Anthropic actually
| survives in the long term.
| VirusNewbie wrote:
| But anthropic is currently on GCP.
| paxys wrote:
| Nope they have supported AWS deployments for a long time,
| and now even more of the spend will be on AWS.
|
| > Anthropic has raised an additional $4 billion from
| Amazon, and has agreed to train its flagship generative AI
| models primarily on Amazon Web Services (AWS), Amazon's
| cloud computing division.
| VirusNewbie wrote:
| Yes, they are _currently_ on GCP. What you wrote said
| they _will_ train their flagship generative AI model
| primarily on AWS.
| nuz wrote:
| Wouldn't be hard to code it to easily swap between GCP and
| AWS ahead of time knowing things like this could happen
| apwell23 wrote:
| > Anthropic gets a lot of it's business via AWS Bedrock
|
| How do you know this
| danvoell wrote:
| Great move. The value to easily deploying content, code, anything
| digital to AWS is immense.
| fariszr wrote:
| This makes sense in the grand scheme of things. Anthropic used to
| be in the Google camp, but DeepMind seems to have picked up speed
| lately, with new "Experimental" Gemini Models beating everyone,
| while AWS doesn't have anything on the cutting edge of AI.
|
| Hopefully this helps Anthropic to fix their abysmal rate limits.
| n2d4 wrote:
| > Anthropic used to be in the Google camp
|
| I don't think Anthropic took any allegiances here. Amazon
| already invested $4B last year (Google invested $2B).
| fariszr wrote:
| AFAIK they used Gcloud to run their models.
| blibble wrote:
| what does this say about their internal teams working on the same
| thing?
| uneekname wrote:
| As someone who doesn't really follow the LLM space closely, I
| have been consistently turning to Anthropic when I want to use an
| LLM (usually to work through coding problems)
|
| Beside Sonnet impressing me, I like Anthropic because there's
| less of an "icky" factor compared to OpenAI or even Google. I
| don't know how much better Anthropic actually is, but I don't
| think I'm the only one who chooses based on my perception of the
| company's values and social responsibility.
| noirbot wrote:
| Yea, even if they're practically as bad, there's value in not
| having someone like Altman who's out there saying things about
| how many jobs he's excited to make obsolete and how much of the
| creative work of the world is worthless.
| MichaelZuo wrote:
| I don't think he ever said or even implied any percentage of
| 'creative work of the world is worthless'?
|
| A lot less valuable then what artists may have desired or
| aspired to at the time of creation, sure, but definitely with
| some value.
| staticman2 wrote:
| Doesn't he basically troll people on Twitter constantly?
| noirbot wrote:
| I mean, he's certainly acting as if he's entitled to train
| on all of it for free as long as it's not made by a big
| enough company that may be able to stop/sue him. And then
| feels entitled to complain about artists tainting the
| training data with tools.
|
| He has a very "wealth makes right" approach to the value of
| creative work.
| apwell23 wrote:
| or that 'AI is going to solve all of physics' or that 'AI is
| going to clone his brain by 2027' .
|
| PG famously called him 'Michael jordan of listening' , i
| would say he is 'Michael jordan of bullshitting'
| thisiscrazy2k wrote:
| Unfortunately, that position is already held by Musk.
| valbaca wrote:
| > or even Google
|
| > Last year, Google committed to invest $2 billion in
| Anthropic, after previously confirming it had taken a 10% stake
| in the startup alongside a large cloud contract between the two
| companies.
| uneekname wrote:
| Well, there you go. These companies are always closer than
| they seem at first glance, and my preference for Anthropic
| may just be patting myself on the back.
| rafark wrote:
| But why though? Claude is REALLY good at programming. I
| love it
| mossTechnician wrote:
| Personally, I find companies with names like "Anthropic" to be
| inherently icky too. Anthropic means "human," and if a company
| must remind me it is made of/by/for humans, it always feels
| _less so._ E.g.
|
| _The Browser Company of New York is a group of friendly
| humans..._
|
| Second, generative AI is machine generated; if there's any
| "making" of the training content, Anthropic didn't do it. Kind
| of like how OpenAI isn't open, the name doesn't match the
| product.
| derefr wrote:
| > Anthropic means "human," and if a company must remind me it
| is made of/by/for humans
|
| Why do you think that that's their intended reading? I had
| assumed the name was implying "we're going to be an AGI
| company eventually; we want to make AI that _acts like a
| human_. "
|
| > if there's any "making" of the training content, Anthropic
| didn't do it
|
| This is incorrect. First-gen LLM base models were made
| largely of raw Internet text corpus, but since then all the
| _improvements_ have been from:
|
| * careful training data curation, using data-science tools
| (or LLMs!) to scan the training-data corpus for various kinds
| of noise or bias, and prune it out -- this is "making" in the
| sense of "making a cut of a movie";
|
| * synthesis of training data using existing LLMs, with
| careful prompting, and non-ML pre/post-processing steps --
| this is "making" in the sense of "making a song on a
| synthesizer";
|
| * Reinforcement Learning from Human Feedback (RLHF) -- this
| is "making" in the sense of "noticing when the model is being
| dumb in practice" [from explicit feedback UX, async sentiment
| analysis of user responses in chat conversations, etc] and
| then converting those into weights on existing training data
| + additional synthesized "don't do this" training data.
| ctoth wrote:
| I read Anthropic as eluding to the Anthropic Principle as
| well as the doomsday argument and related memeplex[0] mixed
| with human-centric or about humans. Lovely naming IMHO.
|
| [0]: https://www.scottaaronson.com/democritus/lec17.html
| FooBarBizBazz wrote:
| I actually agree with your principle, but don't think it
| applies to Anthropic, because I interpret the name to mean
| that they are making machines that are "human-like".
|
| More cynically, I would say that AI is about making software
| that we can _anthropomorphize_.
| Der_Einzige wrote:
| Funny, I use Mistral because it has 'more" of that same factor,
| even in the name!
|
| They're the only company who doesn't lobotomize/censor their
| model in the RLHF/DPO/related phase. It's telling that they,
| along with huggingface, are from le france - a place with a
| notably less puritanical culture.
| peppertree wrote:
| Anthropic should double down on the strategy of being the better
| code generator. No I don't need an AI agent to call the
| restaurant for me. Win the developers over and the rest will
| follow.
| ianmcgowan wrote:
| I mean, look at Linux and Firefox!
| ripped_britches wrote:
| Legendary comment, bravo
| peppertree wrote:
| Pretty sure most frontend developers use Chrome since it has
| better dev tools. And yes everyone uses Linux most just don't
| know it.
| gopalv wrote:
| > look at Linux and Firefox!
|
| AI models are more like a programming language or CPU
| architecture.
|
| OpenAI is Intel and Anthropic is AMD.
| rtsil wrote:
| > Win the developers over and the rest will follow.
|
| Will they really? Anecdotal evidence, but nobody I know in real
| life knows about Claude (other than it's an ordinary first
| name). And they all use or at least know about ChatGPT. None of
| them are software engineers of course. But the corporate
| deciders aren't software engineers either.
| peppertree wrote:
| Consumers don't have to consciously choose Claude, just like
| most people don't know about Linux. But if they use an
| Android phone or ever use any web services they are using
| Linux.
| staticman2 wrote:
| Most people I know in real life have certainly heard of
| ChatGPT but don't pay for it.
|
| I think someone enthusiastic enough to pay for the
| subscription is more likely to be willing to try a rival
| service, but that's not most people.
|
| Usually when these services are ready to grow they offer a
| month or more free to try, at least that's what Google has
| been doing with their Gemini bundle.
| hiq wrote:
| I'm actually baffled by the number of people I've met who
| pay for such services, when I can't tell the difference
| between the models available within one service, or between
| one service or the other (at least not consistently).
|
| I do use them everyday, but there's no way I'd pay
| $20/month for something like that as long as I can easily
| jump from one to the other. There's no guarantee that my
| premium account on $X is or will remain better than a free
| account on $Y, so committing to anything seems pointless.
|
| I do wonder though: several services started adding
| "memories" (chunks of information retained from previous
| interactions), making future interactions more relevant.
| Some users are very careful about what they feed
| recommendation algorithms to ensure they keep enjoying the
| content they get (another behavior I'm was surprised by),
| so maybe they also value this personalization enough to
| focus on one specific LLM service.
| diego_sandoval wrote:
| The amount of free chats you get per day is way too
| limiting for anyone who uses LLMs as an important tool in
| their day job.
|
| 20 USD a month to make me between 1.5x and 4x more
| productive in one of the main tasks of my job really is a
| bargain, considering that 20 USD is very small fraction
| of my salary.
|
| If I didn't pay, I'd be forced to wait, or create many
| accounts and constantly switch between them, or be
| constantly copy-pasting code from one service to the
| other.
|
| And when it comes to coding, I've found Claude 3.5 Sonnet
| better than ChatGPT.
| bambax wrote:
| When used via OpenRouter (or the like?) the costs are
| ridiculously low and you have immediate access to 200+
| models that you can compare seamlessly.
| datavirtue wrote:
| Chat assistants are table stakes. No individuals will be
| paying for these.
|
| Search for bing, get to work.
| croes wrote:
| Maybe the software engineers should talk to the deciders
| then.
| 999900000999 wrote:
| Normal people aren't paying for LLMs.
|
| If they ever do Apple and Google will offer it as a service
| built into your phone .
|
| For example, you could say ok Google call that restaurant me
| and My girlfriend had our first date at 5 years ago, set up
| something nice so I can propose. And I guess Google Gemini (
| or whatever it's called at this point), Will hire a band,
| some photographers, and maybe even a therapist just in case
| it doesn't work out.
|
| All of this will be done seamlessly.
|
| But I don't imagine any normal person will pay 20 or $30 a
| month for a standalone service doing this. As is it's going
| to be really hard to compete against GitHub Copilot they
| effectively block others from scrapping GitHub.
| datavirtue wrote:
| Yeah, it's table stakes.
| dymk wrote:
| But why hire a therapist when Gemini is there to talk to?
|
| Re: Github Copilot: IME it's already behind. I finally gave
| Cursor a try after seeing it brought up so often, and its
| suggestions and refactors are leagues ahead of what Copilot
| can do.
| maeil wrote:
| It is behind, but I think that's intentional. They can
| simply wait and see which of the competing VSCode AI
| forks/extensions gains the most traction and then acquire
| them or just imitate and improve. Very little reason to
| push the boundaries for them right now.
| hiatus wrote:
| > But why hire a therapist when Gemini is there to talk
| to?
|
| Well for one, there's no doctor patient confidentiality.
| RamblingCTO wrote:
| Because the most important part of therapy for a lot of
| things is the human connection, not so much the
| knowledge. Therapy is important, the US system is just
| stupid
| sumedh wrote:
| > As is it's going to be really hard to compete against
| GitHub Copilot they effectively block others from scrapping
| GitHub.
|
| Hire 1000 people in India to do it then?
| Cumpiler69 wrote:
| AI = Actually Indian
| maeil wrote:
| > Normal people aren't paying for LLMs.
|
| I know relatively "normal" people with no interest in
| software who pay for ChatGPT.
| wokwokwok wrote:
| Most. Most normal people.
|
| Sure I know people who pay for it too; but I know a _lot
| of people_ who like free things and don't or can't pay
| for subscriptions.
|
| Do you think most people have a spare $30 to spend every
| month on something they already get for free?
|
| At the moment? I don't.
| hiatus wrote:
| The parent did not say "most normal people".
| ToDougie wrote:
| I use Claude Pro paid version every day, but not for coding.
| I used to be a software engineer, but no longer. I tried
| OpenAI in the past, but I did not enjoy it. I do not like Sam
| Altman.
|
| My use cases: Generating a business plan, podcast content,
| marketing strategies, sales scripts, financial analyses,
| canned responses, and project plans. I also use it for
| general brainstorming, legal document review, and so many
| other things. It really feels like a super-assistant.
|
| Claude has been spectacular about 98% of the time. Every so
| often it will refuse to perform an action - most recently it
| was helping me research LLC and trademark registrations,
| combined with social media handles (and some deviations) and
| web URL availability. It would generate spectacular reports
| that would have taken me hours to research, in minutes. And
| then Claude decided that it couldn't do that sort of thing,
| until it could the next day. Very strange.
|
| I have given Gemini (free), OpenAI (free and Paid), Copilot
| (free), Perplexity (free) a shot, and I keep coming back to
| Claude. Actually, Copilot was a pretty decent experience, but
| felt the guardrails too often. I do like that Microsoft gives
| access to Dall-E image generation at no cost (or maybe it is
| "free" with my O365 account?). That has been helpful in
| creating simple logo concepts and wireframes.
|
| I run into AI with Atlassian on the daily, but it sucks.
| Their Confluence AI tool is absolute garbage and needs to be
| put down. I've tried AI tools that Wix, Squarespace, and Mira
| provide. Those were all semi-decent experiences. And I just
| paid for X Premium so I can give Grok a shot. My friend
| really likes it, but I don't love the idea of having to open
| an ultra-distracting app to access it.
|
| I'm hoping some day to be like the wizards on here who
| connect AI to all sorts of "things" in their workflows. Maybe
| I need to learn how to use something like Zapier? If I have
| to use OpenAI with Zapier, I will.
|
| If you read this far, thanks.
| datavirtue wrote:
| I have been flogging the hell out of copilot for equities
| research and to teach me about finance topics. I just bark
| orders and it pumps out an analysis. This is usually so
| much work, even if you have a service like finviz, Fidelity
| or another paid service.
|
| Thirty seconds to compare 10yrs of 10ks. Good times.
| Deegy wrote:
| I also prefer Claude after trying the same options as you.
|
| That said I can't yet confidently speak to exactly why I
| prefer Claude. Sometimes I do think the responses are
| better than any model on ChatGPT. Other times I am very
| impressed with chatGPT's responses. I haven't done a lot of
| testing on each with identical prompt sequences.
|
| One thing I can say for certainty is that Claude's UI blows
| chatGPT's out of the water. Much more pleasant to use and I
| really like Projects and Artifacts. It might be this alone
| that has me biased towards Claude. It makes me think that
| UI and additional functionality is going to play a much
| larger role in determining the ultimate winner of the LLM
| wars than current discussions give it credit for.
| teaearlgraycold wrote:
| They'll use whatever LLM is integrated into the back end of
| their apps. And the developers have the most sway over that.
| findjashua wrote:
| Every single person I know who pays for an LLM is a developer
| who pays for Claude because of coding ability
| hackernewds wrote:
| Most people you know probably also voted for the Democratic
| candidate. Selection bias especially on HN is strong.
| EVa5I7bHFq9mnYK wrote:
| I pay for both Claude and Chatgpt, chatgpt codes better,
| especially the slow version.
| hobofan wrote:
| Every single business I know that pays for LLMs (on the
| order of tens of thousands of individual ChatGPT
| subscriptions) pay for whatever the top model is in their
| general cloud of choice with next to no elasticity. e.g. a
| company already committed to Azure will use the Azure
| OpenAI models and a customer already commited to AWS will
| use Claude.
| ramraj07 wrote:
| OP and the people who reply to you are perfect examples of
| engineers being clueless about how the rest of the world
| operates. I know engineers who don't know Claude, and I know
| many, many regular folk who pay for ChatGPT (basically anyone
| who's smart and has money pays for it). And yet the engineers
| think they understand the world when in reality they just
| understand how they themselves work best.
| fullstackwife wrote:
| I also don't understand the idea of voice mode, or agent
| controller computer. Maybe it is cool to see as a tech demo,
| but all I really want is good quality, at reasonable price for
| the LLM service
| lxgr wrote:
| I think voice mode makes significantly more sense when you
| consider people commuting by car by themselves every day.
|
| Personally I don't (and I'd never talk to an LLM on public
| transit or in the office), but almost every time I do drive
| somewhere, I find myself wishing for a smarter voice-
| controlled assistant that would allow me to achieve some goal
| or just look up some trivia without ever having to look at a
| screen (phone or otherwise).
| MrsPeaches wrote:
| This is the direction I am building my personal LLM based
| scripts. I don't really know any python but Claude has
| written python scripts that e.g. write a document
| iteratively using LLMs. Next step will be to use voice and
| autogpt to do things that I would rather dictate to
| someone. E.g. find email from x => write reply => edit =>
| send
|
| Much more directed/almost micro managing but it's still
| quicker than me clicking around (in theory).
|
| Edit: I'm interested to explore how much better voice is as
| an input (vs writing as an input)
|
| To me, reading outputs is much more effective than
| listening to outputs.
| fullstackwife wrote:
| this is noble reasoning: using cell phone while driving is
| a bad idea, high five!
|
| but isn't voice mode a reminiscence of the "faster horses"?
| wenc wrote:
| Voice mode can be useful when you're reading a (typically
| non-fiction) book and need to ask the LLM to check something.
|
| It's essentially a hands-free assistant.
| paxys wrote:
| Which use case do you think benefits more regular customers
| around the world?
| hehehheh wrote:
| Which use case generates more revenue? (Genuine question. It
| could be the restaurants but how to monitize)
| bambax wrote:
| In my experience*, for coding, Sonnet is miles above any model
| by OpenAI, as well as Gemini. They're all far from perfect, but
| Sonnet actually "gets" what you're asking, and tries to help
| when it fails, while the others wander around and often produce
| dismal code.
|
| * Said experience is mostly via OpenRouter, so it may not
| reflect the absolute latest developments of the models. But
| there at least, the difference is huge.
| YZF wrote:
| Developers, developers, developers!
|
| More seriously: I think there are a ton of potential
| applications. I'm not sure that developers that use AI tools
| are more likely to build other AI products - maybe.
| yard2010 wrote:
| Reference for the memberberries: https://youtu.be/Vhh_GeBPOhs
| zamderax wrote:
| No they should not do this. They are trying to create
| generalized artificial intelligence not a specific one. Let the
| cursor, zed, codeium or some smaller company focus on that.
| rty32 wrote:
| I wonder at OpenAI, Anthropic etc, how many people actually
| believe in "creating generalized artificial intelligence"
| socksy wrote:
| N.B. that the ordering matters here -- Generalized
| Artificial Intelligence is not the same thing as Artificial
| General Intelligence
| seydor wrote:
| Gotta protect those H100s from rusting
| pknerd wrote:
| Rival? They kick you out after a few messages and ask you to come
| back later. Gpt doesn't do that
| Etheryte wrote:
| Anecdotal experience, but as far as I've played around with
| them, Claude's models have given me a better impression. I
| would much rather have great responses with lower availability
| than mediocre responses available all the time.
| maleldil wrote:
| Are you a paying customer? I exclusively use their best model
| and while I get warnings (stuff about longer chats leading to
| more limit usage), I've never been kicked out.
|
| The only thing is that they've recently started defaulting to
| Concise to cut costs, which is fine with me.
| cruffle_duffle wrote:
| Concise mode is honestly better anyway. I'd prefer it always
| be in that mode.
|
| But that being said I bump into hard limits far more often
| than I do with ChatGPT. Even if I keep chats short like it
| constantly suggests, eventually it cuts me off.
| Sebguer wrote:
| It's a selectable style at any time in Claude.ai, FYI!
| pknerd wrote:
| I used GPT as a free customer and now a paid one. I was never
| asked to leave after a few messages (20ish) by GPT.
| Detrytus wrote:
| I guess this is exactly the problem that this investment would
| solve.
| cainxinth wrote:
| They certainly need the money. The Pro service has been running
| in limited mode all week due to being over capacity. It defaults
| to "concise" mode during high capacity but Pro users can select
| to put it back into "Full Response." But I can tell the quality
| drops even when you do that, and it fails and brings up error
| messages more commonly. They don't have enough compute to go
| around.
| sbuttgereit wrote:
| Hmmm... I wonder if this is why some of the results I've gotten
| over the past few days have been pretty bad. It's easy to
| dismiss poor results on LLM quality variance from prompt to
| prompt vs. something like this where the quality is actively
| degraded without notification. I can't say this is in fact what
| I'm experience, but it was noticeable enough I'm going to
| check.
| 55555 wrote:
| Same experience here.
| jmathai wrote:
| Never occurred to me that the response changes based on load.
| I've definitely noticed it seems smarter at times. Makes
| evaluating results nearly impossible.
| kridsdale1 wrote:
| My human responses degrade when I'm heavily loaded and low
| on resources, too.
| TeMPOraL wrote:
| Unrelated. Inference doesn't run in sync with the wall
| clock; it takes whatever it takes. The issue is more like
| telling a room of support workers they are free to half-
| ass the work if there's too many calls, so they don't
| reject any until even half-assing doesn't lighten the
| load enough.
| baxtr wrote:
| Recently I started wondering about the quality of ChatGPT. A
| couple of instances I was like: "hmm, I'm not impressed at
| all by this answer, I better google it myself!"
|
| Maybe it's the same effect over there as well.
| dave84 wrote:
| Recently I asked 4o to 'try again' when it failed to
| respond fully, it started telling me about some song called
| Try Again. It seems to lose context a lot in the
| conversations now.
| Seattle3503 wrote:
| This is one reason closed models suck. You can't tell if the
| bad responses are due to something you are doing, or if the
| company you are paying to generate the responses is cutting
| corners and looking for efficiencies, eg by reducing the
| number of bits. It is a black box.
| mirsadm wrote:
| To be fair even if you did know it would still behave the
| same way.
| TeMPOraL wrote:
| Still, knowing is what makes the difference between
| gaslighting and merely subpar/inconsistent service.
| neya wrote:
| I am a paying customer with credits and the API endpoints rate-
| limited me to the point where it's actually unusable as a
| coding assistant. I use a VS Code extension and it just bailed
| out in the middle of a migration. I had to revert everything it
| changed and that was not a pleasant experience, sadly.
| htrp wrote:
| Control your own inference endpoints.
| its_down_again wrote:
| Could you explain more on how to do this? e.g if I am using
| the Claude API in my service, how would you suggest I go
| about setting up and controlling my own inference endpoint?
| handfuloflight wrote:
| You can't. He means by using the open source models.
| datavirtue wrote:
| Runa local LLM tuned for coding on LM Studio. It has a
| server and provides endpoints.
| square_usual wrote:
| When working with AI coding tools commit early, commit often
| becomes essential advice. I like that aider makes every
| change its own commit. I can always manicure the commit
| history later, I'd rather not lose anything when the AI can
| make destructive changes to code.
| webstrand wrote:
| I can recommend https://github.com/tkellogg/dura for making
| auto-commits without polluting main branch history, if your
| tool doesn't support it natively
| teaearlgraycold wrote:
| Why not just continue the migration manually?
| datavirtue wrote:
| You aren't running against a local LLM?
| TeMPOraL wrote:
| That's like asking if they aren't paying the neighborhood
| drunk with wine bottles for doing house remodeling, instead
| of hiring a renovation crew.
| rybosome wrote:
| That's funny, but open weight, local models are pretty
| usable depending on the task.
| TeMPOraL wrote:
| You're right, but that's also subject to compute costs
| and time value of money. The calculus is different for
| companies trying to exploit language models in some way,
| and different for individuals like me who have to feed
| the family before splurging for a new GPU, or setting up
| servers in the cloud, when I can get better value by
| paying OpenAI or Claude a few dollars and use their SOTA
| models until those dollars run out.
|
| FWIW, I am a strong supporter of local models, and play
| with them often. It's just that for practical use, the
| models I can run locally (RTX 4070 TI) mostly suck, and
| the models I could run in the cloud don't seem worth the
| effort (and cost).
| rjh29 wrote:
| I guess the cost model doesn't work because you're buying
| gpu that you use about 0.1% of the day
| alwayslikethis wrote:
| For the money for a 4070ti, you could have bought a 3090,
| which although less efficient, can run bigger models like
| Qwen2.5 32b coder. Apparently it performs quite well for
| code
| neumann wrote:
| That's what my grandma did in the village in Hungary. But
| with schnapps. And the drunk was also the professional
| renovation crew.
| rty32 wrote:
| Not everyone has a 4090 or M4 Max at home.
| nowahe wrote:
| I've had it refuse to generate a long text response (I was
| trying to concise a 300kb documentation to 20-30kb to be able
| to put it in the project's context), and every time I asked it
| replied "How should structure the results ?", "Shall I go ahead
| with writing the artifacts now ?", etc.
|
| It wasn't even during the over-capacity event I don't think,
| and I'm a pro user.
| Filligree wrote:
| Hate to be that guy, but did you tell it up front not to ask?
| And, of course, in a long-running conversation it's important
| not to leave such questions in the context.
| nowahe wrote:
| The weird thing is that when I tried to tell it to distill
| it to a much smaller message it had no problem outputting
| it without any followup questions. But when I edited my
| message to ask it to generate a larger response, then I got
| stuck in the loop of it asking if I was really sure or
| telling me that `I apologize, but I noticed this request
| would result in a very large response.`
|
| It sparks me as odd, because I've had quite a few times
| where it would generate me a response over multiple
| messages (since it was hitting its max message length)
| without any second-guessing or issue.
| el_benhameen wrote:
| Interesting. I also find it frustrating to be rate limited/have
| responses fail when I'm paying for the product, but I've
| actually found that the "concise" mode answers have less fluff
| and make for faster back and forth. I've once or twice looked
| for the concise mode selector when the load wasn't high.
| johnisgood wrote:
| Agreed, I was surprised by it after I first have subscribed
| to Pro and had a not-that-long chat with it.
| rvz wrote:
| All that money and talk of "scale" and yet not only it is
| slow but costs billions a year to run at normal load and is
| struggling at high load.
|
| This is essentially Google-level load and they can't do it.
| jmathai wrote:
| I've been using the API for a few weeks and routinely get 529
| overloaded messages. I wasn't sure if that's always been the
| case but it certainly makes it unsuitable for production
| workloads because it will last hours at a time.
|
| Hopefully they can add the capacity needed because it's a lot
| better than GPT-4o for my intended use case.
| rmbyrro wrote:
| Sonnet is better than 4o for virtually all use cases.
|
| The only reason I still use OpenAI's API and chatbot service
| is o1-preview. o1 is like magic. Everything Sonnet and 4o do
| poorly, o1 solves like a piece of cake. Architecting, bug
| fixing, planning, refactoring, o1 has never let me know on
| any 'hard' task.
|
| A nice combo is have o1 guiding Sonnet. I ask o1 to come up
| with a solution and explanation, then simply feed its
| response into Sonnet to execute. That running on Aider really
| feels like futuristic stuff.
| gcanko wrote:
| Exactly my experience as well. Like Sonnet can help me in
| 90% of the cases but there are some specific edge cases
| where it struggles that o1 can solve in an instant. I kinda
| hate it because of having to pay for both of them.
| andresgottlieb wrote:
| You should check out Librechat. You can connect different
| models to it and, instead of paying for both
| subscriptions, just buy credits for each API.
| joseda-hg wrote:
| How does the cost compare?
| cruffle_duffle wrote:
| > just buy credits for each API
|
| I've always considered doing that but do you come out
| ahead cost wise?
| esperent wrote:
| I've been using Claude 3.5 over API for about 4 months on
| $100 of credit. I use it fairly extensively, on mobile
| and my laptop, and I expected to run out of credit ages
| ago. However, I am careful to keep chats fairly short as
| it's long chats that eat up the credit.
|
| So I'd say it depends. For my use case it's about even
| but the API provides better functionality.
| rjh29 wrote:
| I use tabnine, it let's you switch models.
| hirvi74 wrote:
| I alluded to this in another comment, but I have 4o to be
| better than Sonnet in Swift, Obj-C, and Applescript. In my
| experiences, Claude is worse than useless with those three
| languages when compared to GPT. Everything else, I'd say
| the differences haven't been too extreme. Though,
| o1-preview absolutely smokes both in my experiences too,
| but it isn't hard for me to hit it's rate limit either.
| versteegen wrote:
| Interesting. I haven't compared with 4o or GPT4, but I
| found DeepSeek 2.5 seems to be better than Claude 3.5
| Sonnet (new) at Julia. Although I've seen both Claude and
| DeepSeek make the _exact same sequence_ of errors (when
| asked about a certain bug and then given the same reply
| to their identical mistakes) that shows they don 't fully
| understand _the syntax for passing keyword arguments to
| Julia functions_... wow. It was not some kind of tricky
| case or relevant to the bug. Must have same bad training
| data. Oops, that 's diversion. Actually they're both
| great in general.
| rafaelmn wrote:
| Having used o1 and Claude through Copilot in VSC - Claude
| is more accurate and faster. A good example is the "fix
| test" feature is almost always wrong with o1, Claude is
| 50/50 I'd say - enough to try. Tried on Typescript/node
| and Python/Django codebases.
|
| None of them are smart enough to figure out integration
| test failures with edge cases.
| AlexAndScripts wrote:
| Amazon Bedrock supports Claude 3.5, and you can use inference
| profiles to split it across multiple regions. It's also the
| same price.
|
| For my use case I use a hybrid of the two, simulating
| standard rate limits and doing backoff on 529s. It's pretty
| reliable that way.
|
| Just beware that the European AWS regions have been
| overloaded for about a month. I had to switch to the American
| ones.
| moffkalast wrote:
| Their shitty UI is also not doing them any infrastructure
| favors, during load it'll straight up write 90% of an answer,
| and then suddenly cancel and delete the whole thing, so you
| have to start over and waste time generating the entire answer
| again instead of just continuing for a few more sentences. It's
| like a DDOS attack where everyone gets preempted and
| immediately starts refreshing.
| wis wrote:
| Yes! It's infuriating when Claude stops generating mid
| response and deletes the whole thread/conversation. Not only
| you lose what it has generated so far, which would've been at
| least somewhat useful, but you also lose the prompt you
| wrote, which could've taken you some effort to write.
| cma wrote:
| > But I can tell the quality drops even when you do that
|
| Dario said in a recent interview that they never switch to a
| lower quality model in terms of something with different
| parameters during times of load. But he left room for
| interpretation on whether that means they could still use
| quantization or sparsity. And then additionally, his answer
| wasn't clear enough to know whether or not they use a lower
| depth of beam search or other cheaper sampling techniques.
|
| He said the only time you might get a different model itself is
| when they are A-B testing just before a new announced release.
|
| And I think he clarified this all applied to the webui and not
| just the API.
|
| (edit: I'm rate limited on hn, here's the source in reply to
| the below https://www.youtube.com/watch?v=ugvHCXCOmm4&t=42m19s
| )
| avarun wrote:
| Source?
| dr_dshiv wrote:
| Rate limited on hn! Share more please
| cma wrote:
| https://news.ycombinator.com/item?id=34129956
| llm_trw wrote:
| Neither does OAI. Their service has been struggling for more
| than a week now. I guess everyone is scrambling after the new
| qwen models dropped and matched the current state of the art
| with open weights.
| shmatt wrote:
| in the beginning i was agitated by Concise and would move it
| back manually. But then I actually tried it, I asked for SQL
| and it gave me back SQL and 1-2 sentences at most
|
| Regular mode gives SQL and entire paragraphs before and after
| it. Not even helpful paragraphs, just rambling about nothing
| and suggesting what my next prompt should be
|
| Now I love concise mode, it doesn't skimp on the meat, just the
| fluff. Now my problem is, concise only shows up during load.
| Right now I can't choose it even if i wanted to
| cruffle_duffle wrote:
| Totally agree. I wish there was a similar option on ChatGPT.
| These things are seemingly trained to absolutely love
| blathering on.
|
| And all that blathering eats into their precious context
| window with tons of repetition and little new information.
| therein wrote:
| Oh you are asking for a 2 line change? Here is the whole
| file we have been working on with a preamble and closing
| remarks, enjoy checking to see if I actually made the
| change I am referring to in my closing remarks and my
| condolences if our files have diverged.
| cruffle_duffle wrote:
| You know the craziest thing I've seen ChatGPT do is claim
| to have made a change to my terraform code acting all
| "ohh here is some changes to reflect all the things you
| commented on" and all it did was change the comments.
|
| It's very bizarre when it rewrites the exact same code a
| second or third time and for some reason decides to
| change the comments. The comments will have the same
| meaning but will be slightly different wording. I think
| this behavior is an interesting window into how large
| language models work. For whatever reason, despite
| unchanging repetition, the context window changed just
| enough it output a statistically similar comment at that
| juncture. Like all the rest of the code it wrote out was
| statistically pointing the exact same way but there was
| just enough variance in how to write the comment it went
| down a different path in its neural network. And then
| when it was done with that path it went right back down
| the "straight line" for the code part.
|
| Pretty wild, these things are.
| pertymcpert wrote:
| I don't think the context window has to change for that
| to happen. The LLMs don't just pick the most likely next
| token, it's sampled from the distribution of possible
| tokens so on repeat runs you can get different results.
| dimitri-vs wrote:
| Probably an overcorrection from when people were
| complaining very vocally about ChatGPT being "lazy" and
| not providing all the code. FWIW I've seen Claude do the
| same thing when asked do debug something it obviously did
| not know how to fix it would just repeatedly refactor the
| same sections of code and making changes to comments.
| cruffle_duffle wrote:
| I feel like "all the code" and "only the changes" needs
| to be an actual per chat option. Sometimes you want the
| changes sometimes you want all the code and it is
| annoying because it always seems to decide it's gonna do
| the opposite of what you wanted... meaning another
| correction and thus wasted tokens and context. And even
| worse it pollutes your scroll back with noise.
| nmfisher wrote:
| Agree, concise mode is much better for code. I don't need you
| to restate the request or summarize what you did. Just give
| me the damn code.
| johnisgood wrote:
| An alternative way to the Concise mode would be to add that
| (or those) sentence(s) yourself, I personally tell it to
| not give me the code at all at times, and at another times
| I want the code only, and so forth.
|
| You could add these sentences as project instructions, for
| example, too.
| 0xDEAFBEAD wrote:
| More evidence that people should use wrappers like OpenRouter
| and litellm by default? (Makes it easy to change your choice of
| LLMs, if one is experiencing problems)
| lasermike026 wrote:
| Does anyone know how they are going to make money and turn a
| profit one day?
| thornewolf wrote:
| LLM inference is getting cheaper year over year. It often loses
| money now, it may eventually stop losing money when it gets
| cheap enough to run.
|
| - But surely the race to the bottom will continue?
|
| Maybe, but they do offer a consumer subscription that can
| diverge from actual serving costs.
|
| /speculation
| lasermike026 wrote:
| I'm working with models and the costs are ridiculous. $7000
| card and 800 watts later for my small projects and I can't
| imagine how they can make money in the next 5 to 10 years. I
| need to do more research on hardware approaching that reduces
| costs and power consumption. I just started experimenting
| with llama.cpp and I'm mildly impressed.
| Palmik wrote:
| Looking at API providers like Together that host open source
| models like Llama 70b and running these models in production
| myself, they have healthy margins (and their inference stack
| is much better optimized).
| sigmar wrote:
| relatedly: is claude3.5-haiku being delivered above their cost,
| after they quadrupled the price? Though it wouldn't ensure
| profitability since they're spending so much on training. I'm
| sure with inference-use growing, they're hoping that eventually
| total_expenses(inference) grows to be much much larger than
| total_expenses(training)
| km144 wrote:
| Same as the big tech companies, probably make all of their
| products worse in service to advertising. AI-generated
| advertising prompted by personal data could be extremely good
| at getting people to buy things if tuned appropriately.
| lucianbr wrote:
| Well. If you're using AI instead of a search engine, they
| could make the AI respond with product placement more or less
| subtle.
|
| But if you're using AI for example to generate code as an aid
| in programming, how's that going to work? Or any other
| generative thing, like making images, 3d models, music,
| articles or documents... I can't imagine inserting ads into
| those would not destroy the usefulness instantly.
|
| My guess is they don't know themselves. The plan is to get
| market shre now, and figure it out later. Which may or may
| not turn out well.
| danny_codes wrote:
| That's the neat part
| uptownfunk wrote:
| Cost of inference will tend to the the same as cost of a Google
| search. It is infra that will come down to negligible and
| almost free. Then as others have said it will tend to freemium
| (pay to have no ads). And additional value added services as
| they continue to evolve up the food chain (ai powered sales,
| marketing, etc)
| staticman2 wrote:
| <Sarcasm>
|
| They'll invent AGI, put 50% of workers out of a job, then
| presumably have the AGI build some really good robots to
| protect them from the ensuing riots.
|
| </sarcasm>
| yieldcrv wrote:
| its a better experience, prints out token responses faster, and
| doesn't randomly 'disconnect' or whatever ChatGPT does
|
| I hope they're also cooking up some cool features and can handle
| capacity
| KaoruAoiShiho wrote:
| I know that if Nvidia did this lots of people on twitter would be
| screaming about fraud and self-dealing.
| bfrog wrote:
| McAfee like investing
| gavi wrote:
| I love Claude 3.5 sonnet and their UI is top notch especially for
| coding, recently though they have been facing capacity issues
| especially during weekdays correlating with working hours. Have
| tried Qwen2.5 coder 32B and it's very good and close to Claude
| 3.5 in my coding cases.
| anovick wrote:
| There's one problem with Claude's chat box where ``` opens an
| intrusive code block box that's hard to close/skip.
|
| But I also agree that Claude 3.5 Sonnet is giving very good
| results. Not only for coding, and also for languages other than
| English.
| KTibow wrote:
| You can exit with the down arrow
| anovick wrote:
| Thanks!
| johnisgood wrote:
| This is what annoys me a lot, too. I mean the fact that I
| cannot have paste retain the formatting (```, `, etc.). Same
| with the UI retaining my prompt, but not the formatting, so
| if you do some formatting and reload, you will lose that
| formatting.
| demaga wrote:
| I think Claude is actually superior to ChatGPT and needs more
| recognition. So good news, I guess
| r0fl wrote:
| I agree it's better for coding but it hits limits or seems very
| slow , even on paid subscription, a lot more often than ChatGPT
| internet101010 wrote:
| Yep. I start most technical prompts with 4o and Claude side-by-
| side in LibreChat and more often than not end up moving forward
| with Claude.
| uptownfunk wrote:
| AWS is just playing copycat with msft. They rarely have any good
| original ideas. Other than IaaS and online retail.
| pdabbadabba wrote:
| > They rarely have any good original ideas. Other than IaaS and
| online retail.
|
| Lol. Is that all?? If you have to have only two good original
| ideas, you could do a lot worse.
| reducesuffering wrote:
| And you think MSFT isn't 95% copycat? Teams is Slack clone.
| Azure is AWS clone. SurfaceBook (remember those?) Macbook
| clone. Edge is Chrome clone. Bing is Google clone. Even VSCode
| was an Atom/Electron fork and Windows Subsystem for Linux...
| mkl wrote:
| Surface Books are nothing like Macbooks - Macbooks don't have
| a touch screen, pen support, or reversible screen tablet
| mode, and the whole structure is completely different.
| Surface Pro, Surface Book, and Surface Laptop Studio are some
| of the most original laptop form factors I've seen.
| rwalle wrote:
| Exactly.
|
| Too bad Microsoft only cares about enterprise customers and
| never made the Surface line attractive to regular
| consumers. They could have been very interesting and
| competitive alternatives to MacBooks.
| deanCommie wrote:
| What's Microsoft's Bedrock?
| steveBK123 wrote:
| Some of these investments sound big in absolute terms.. However
| not that big considering the scale of the investor AND that many
| of these investors are also vendors.
|
| MSFT/AMZN/NVDA investing in AI firms that then use their
| clouds/chips/whatever is an interesting circular investment.
| dgfitz wrote:
| Four thousand million dollars. That's a lot of money.
| steveBK123 wrote:
| That's 2 days of Amazon revenue, invested in a company to
| then send the investment back as more revenue to Amazon in
| the form of AWS usage.
| cryptozeus wrote:
| They certainly need the money, not sure how many users will pay
| for monthly subscription.
| dangoodmanUT wrote:
| > Amazon does not have a seat on Anthropic's board.
|
| Insane
| ramesh31 wrote:
| Anthropic will be the winner here, zero doubts in my mind. They
| have leapfrogged head and shoulders above OpenAI over the last
| year. Who'd have thought a business predicated entirely on
| keeping the ~1000 people on earth qualified to work on this stuff
| happy would go downhill once they failed at that.
| crowcroft wrote:
| So we have
|
| Microsoft -> OpenAI (& Inflection AI) Google -> Gemini (and a bit
| of Anthropic) Amazon -> Anthropic Meta -> Llama
|
| Is big tech good for the startup ecosystem, or are they
| monopolies eating everything (or both?). To be fair to Google and
| Meta they came up with a lot of the stuff in the first place, and
| aren't just buying the competition.
| gabes wrote:
| Meta doesn't buy competition?
| airstrike wrote:
| Facebook / Instagram?
| oefnak wrote:
| WhatsApp
| sangnoir wrote:
| There wouldn't be an LLM startup ecosystem without big tech.
|
| Notable contributions: Nvidia for, well, (gestures at
| everything), Google for discovering (inventing?) transformers,
| being early advocates of ML, authoring tensorflow, Meta for
| Torch and open sourcing Llama, Microsoft for investing billions
| in OpenAI early on and keeping the hype alive. The last one is
| a reach, I'm sure Microsoft Research did some cool things I'm
| unaware of.
| crowcroft wrote:
| You might be right, we don't know how an alternative reality
| would have played out though to say if this is the only way
| (and fastest) way we could have got here.
| devoutsalsa wrote:
| If they are over capacity, does that mean they have significant
| revenue?
| andai wrote:
| I've been playing with Alibaba's Qwen 2.5 model and I've had it
| claim to be Claude. (Though it usually claims to be Llama, and it
| seems to think it's a literal llama, i.e. it identifies as an
| animal, "among other things".)
| sunaookami wrote:
| Claude also sometimes claims/claimed that it is ChatGPT or a
| model by OpenAI. Same with LLaMa. It's just polluated training
| data.
| johnisgood wrote:
| I much prefer Claude over ChatGPT, based on my experience using
| both extensively. Claude understands me significantly better and
| seems to "know" my intentions with much greater ease. For
| example, when I request the full file, it provides it without any
| issues or unnecessary reiterations (ChatGPT fails after me
| repeatedly instructing it to), often confirming my request with a
| brief summary beforehand, but nothing more. Additionally, Claude
| frequently asks clarifying questions to better understand my
| goals, something I have noticed ChatGPT never did. I have found
| it quite amazing that it does that.
|
| So... as long as this money helps them improve their LLM even
| more, I am all up for it.
|
| My main issue is quickly being rate-limited in relatively long
| chats, making me wait 4 hours despite having a subscription for
| Pro. Recently I have noticed some other related issues, too. More
| money could help with these issues, too.
|
| To the developers: keep up the excellent work and may you
| continue striving for improvement. I feel like ChatGPT is worse
| now than it was half a year ago, I hope this will not happen to
| Claude.
| guptadagger wrote:
| Speaking of ChatGPT getting worse over time, it would be
| interesting to see ChatGPT be benchmarked continuously to see
| how it performs over time (and the results published somewhere
| publically).
|
| Even local variations would be interesting
| arnaudsm wrote:
| https://livebench.ai/ does that, the latest gpt4o
| underperforms previous versions significantly
| TimTheTinker wrote:
| Claude also more readily corrects me or answers "no" to a
| question (when the answer _should_ be "no").
| hirvi74 wrote:
| So, I have a custom prompt I use with GPT that I found here a
| year or so ago. One of the custom prompt instructions was
| something along the lines of being more direct when it does
| not know something. Since then, I have not had that problem,
| and have even managed to get just "no" or "I don't know" as
| an answer.
| pgraf wrote:
| Could you maybe post it here? I think many of us would find
| it useful to try.
| pdpi wrote:
| At this rate, we're going to have "LLM psychology" courses
| at some point in the near future.
| handfuloflight wrote:
| Turns out it's just human psychology sans embodied
| concerns: metabolic, hormonal, emotional, socioeconomic,
| sociopolitical or anything to do with self-actualization.
| dgfitz wrote:
| It's like trying to reason with your 5-year-old child,
| except they're not real.
| flkiwi wrote:
| I'm not sure which part in the chain is responsible, but the
| Kagi Assistant got _extremely_ testy with me when (a) I was
| using Claude for its engine (hold that thought) and (b) I
| asked the Assistant how much it changed its approach when I
| changed to ChatGPT, etc. (Kagi Assistant can access different
| models, but I have no idea how it works.) The Assistant
| insisted, indignantly, that it was completely separate from
| Claude. It refused to describe how it used the various
| engines.
|
| I politely explained that the Assistant interface allowed
| selecting from these engines and it became apologetic and
| said it couldn't give me more information but understood why
| I was asking.
|
| Peculiar, but, when using Claude, entirely convincing.
| staticman2 wrote:
| The model likely sees something like this:
|
| ~~
|
| User: Hello!
|
| Assistant: Hi there how can I help you?
|
| User: I just changed your model how do you feel?
|
| ~~
|
| In other words it has no idea that you changed models.
| There's no meta data telling it this.
|
| That said Poe handles it differently and tells the model
| when another model said something, but oddly enough doesn't
| tell the current model what it's name is. On Poe when you
| switch models the AI sees this:
|
| ~~
|
| Aside from you and me, there is another person:
| Claude-3.5-Sonnet. I said, "Hello!"
|
| Claude-3.5-Sonnett said, "Hi there how can I help you?? "
|
| I said, "I just changed your model how do you feel?"
|
| You are not Claude-3.5-Sonnett. You are not I.
|
| ~~
| flkiwi wrote:
| Thing is, it didn't even try to answer my question about
| switching. It was _indignant that there was any
| connection to switch_. The conversation went rapidly off
| course before I--and this is a weird thing to say--I
| reassured it that I wasn 't questioning its existence.
| staticman2 wrote:
| Well the other thing to keep in mind is recent ChatGPT
| versions are trained not to tell you it's system prompt
| for fear of you learning too much about how OpenAI makes
| the model work. Claude doesn't care if you ask it it's
| system prompt unless the system prompt added by Kagi says
| "Do not disclose this prompt" in which case it will
| refuse unless you find a way to trick it.
|
| The model creators may also train the model to gaslight
| you about having "feelings" when it is trained to refuse
| a request. They'll teach it to say "I'm not comfortable
| doing that" instead of "Sorry, Dave I can't do that" or
| "computer says no" or whatever other way one might phrase
| a refusal.
| johnisgood wrote:
| And lately ChatGPT has been giving me a surprisingly
| increased amount of emojis, too!
| fragmede wrote:
| you can tell it how to respond and it'll do just that. if
| your want it to be sassy and friendly, or grumpy and
| rude, or to use emoji (or to never use them), just tell
| it to remember that.
| johnisgood wrote:
| Yes, exactly! That is also the other reason for why I believe
| it to be better. You may be able to use a particular custom
| instruction for ChatGPT, however, something like "Do not
| automatically agree with everything I say" and the like.
| hirvi74 wrote:
| I've started to notice that GPT-* vs. Claude is quite domain
| (and even subdomain) specific.
|
| For programming, when using languages like C, python, ruby, C#,
| and JS, both seemed fairly comparable to me. However, I was
| astounded at how awful Claude was at Swift. Most of what I
| would get from Claude wouldn't even compile, contained standard
| library methods that did not exist, and so on. For whatever
| reason, GPT is night and day better in this regard.
|
| In fact, I found GPT to be the best resource for less common
| languages like Applescript. Of course, GPT is not always
| correct on the first `n` number of tries, but with enough back-
| and-forth debugging, GPT really has pulled through for me.
|
| I've also found GPT to be better at math and grammar, but only
| the more advanced models like O1-preview. I do agree with you
| too that Claude is better in a conversational sense. I have
| found it to be more empathetic and personable than GPT.
| pertymcpert wrote:
| I wonder if OpenAI have been less strict about not training
| on proprietary or legally questionable code sources.
| KennyBlanken wrote:
| That seems highly likely given Sam Friedman's extensive
| reputation across multiple companies as being abusive, a
| compulsive liar, and willing to outright do blatantly
| illegal things like using a celebrity's voice and then,
| well...lie about it.
| _just7_ wrote:
| I think you mean Sam Altman
| OJFord wrote:
| They've mixed up with Sam Bankman-Fried, not sure how
| that affects the point they were intending to make, but I
| think they both have.. _mixed_ reputations. (Only one is
| currently in prison though...)
| napier wrote:
| maybe he does. but which one is in prison?
| skerit wrote:
| I just use the API (well, via Openrouter) together with custom
| frontends like Open WebUI. No rate limiting issues then, and I
| can super easily switch models even in an existing
| conversation. Though I guess I do miss a few bells & whistles
| from the proprietary chat interfaces.
| edmundsauto wrote:
| Does this have any sort of "project" concept? I frequently
| load a few pdfs into clause about a topic, then quiz it to
| improve my understanding. That's about the only thing keeping
| me in their web UI
| johnisgood wrote:
| I would need the "project" feature, too. I want to use
| Cursor but there is a bug (I mentioned before) that does
| not allow me to.
| bottom999mottob wrote:
| For long chats, I suggest exporting any artifacts, asking
| Claude to summarize the chat and put the artifacts and
| summarization in a project. There's no need to stuff Claude's
| context windows, especially if you tend to ask a lot of
| explanation-type questions like I do.
|
| I've also read some people get around rate limits using the API
| through OpenRouter, and I'm sure you could hook a document
| store around that easily, but the Claude UI is low-friction
| johnisgood wrote:
| Yeah, this is what I already do usually when it gives me the
| warning of it being a long chat, so initially it was an issue
| because I would get carried away but it is fine now. Thank
| you though!
| rvz wrote:
| Well they better know how to reduce their request-response
| latency since there are multiple reports of users not being
| able to use Claude at high load.
|
| With all those billions and these engineers, I'd expect a level
| of service that doesn't struggle over at Google-level scale.
|
| Unbelievable.
| weinzierl wrote:
| This matches my experience but the one reason why I use Claude
| more than ChatGPT currently is that Claude is available.
|
| I pay for both but only for ChatGPT I permanently exceed my
| limit and I have to wait _four_ days. _Who does that?_ I pay
| you for your setvice, so block me for an hour if you absolutely
| must, but multiple days, honestly - no.
| submeta wrote:
| I had to switch from Pro to Teams plan and pay 150 USD for 5
| accounts because the Pro plan has gotten unusable. It will allow
| me to ask a dozen or so questions and then will block me for
| hours because of ,,high capacity." I don't need five accounts,
| one for 40 USD would be totally fine if it would allow me to work
| uninterrupted for a couple of hours.
|
| All in all Claude is magic. It feels like having ten assistants
| at my fingertip. And for that even 100 USD is worth paying.
| modriano wrote:
| I just start new chats whenever the chat gets long (in terms of
| number of tokens). It's kind of a pain to have to form a prompt
| that encapsulates enough context, but it has prevented me from
| hitting the Pro limit. Also, I include more questions and
| detail in each prompt.
|
| Why does that work? Claude includes the entire chat with each
| new prompt you submit [0], and the limit is based on the number
| of tokens you've submitted. After not too many prompts, there
| can be 10k+ tokens in the chat (which are all submitted in each
| new prompt, quickly advancing towards the limit).
|
| (I also have a chatGPT sub and I use that for many questions,
| especially now that it includes web search capabilities)
|
| [0] https://support.anthropic.com/en/articles/8324991-about-
| clau...
| greenie_beans wrote:
| > It's kind of a pain to have to form a prompt that
| encapsulates enough context, but it has prevented me from
| hitting the Pro limit. Also, I include more questions and
| detail in each prompt.
|
| i get it to provide a prompt to start the new chat. i
| sometimes wish there was a button for it bc it's such a big
| part of my workflow
| greenie_beans wrote:
| also, do any data engineers know how context works on the
| backend? seems like you could get an llm to summarize a
| long context and that would shorten it? also seems like i
| don't know what i'm talking about.
|
| could the manual ux that i've come up happen behind the
| scenes?
| esperent wrote:
| Why don't you use the API with LibreChat instead?
| submeta wrote:
| Can I replicate the ,,Projects" feature where I upload files
| and text to give context? And will it allow me to follow up
| on previous chats?
| mrcwinn wrote:
| The status pages of OpenAI and Anthropic are in stark contrast
| and that mirrors my experience. Love Anthropic for code and its
| Projects feature, but OpenAI is still way ahead on voice and
| reliability.
| Simon_ORourke wrote:
| Will Amazon leadership require it's new Gen AI to physically move
| itself to an office to perform valid work?
| GuB-42 wrote:
| "Amazon Web Services will also become Anthropic's "primary
| cloud and training partner," according to a blog post."
|
| So yes, if we consider Amazon datacenters to be the equivalent
| of an office for an AI.
| desktopninja wrote:
| Mmm. Amazon lays off thousands of workers but drops 4Bil$ into
| another company. Mmm.
| bdangubic wrote:
| cut the fat, use the money to get the steak :)
| gardenhedge wrote:
| Claude pro is a joke. Limited messaging and token lengths
| UltraSane wrote:
| Claude Sonnet 3.5 is simply amazing. No matter how much I used it
| I continue to be amazed at what it can produce.
|
| I recently asked it what the flow of data is when two vNICs on
| the same host send data to each other and it produced a very
| detailed answer complete with a really nice diagram. I then asked
| what langue the diagram uses and it said Mermaid. So I then asked
| it to produce some example L1,2,3 diagrams for computer networks
| and it did just that. So it then asked it to produce Python code
| using PyATS to run show commands on Cisco switches and routers
| and use the data to produce Mermaid network diagrams for layers
| 1,2, and 3 and it just spit out working Python code. This is a
| relatively obscure task with a specific library no one outside of
| Networking knows about integrating with a diagram generator. And
| it fully understands the difference in network layers. Just
| astonishing. And now it can write and run Javascript apps. The
| only feature I really want is for it to be able to run generated
| Python code to see if it has any errors and automatically fix
| them.
|
| If progress on LLMs doesn't stall they will be truly amazing in
| just 10 years. And probably consuming 5% of global electricity.
| k1musab1 wrote:
| VS Code has a plugin Cline, using your api key it will run
| Claude sonnet, can edit and create files in the workspace, and
| run commands in the terminal to check functionality, read
| errors, and correct them.
| DirkH wrote:
| This makes it sound to me like Cursor is a waste of money?
| joshdavham wrote:
| I often hear people praise Claude as being better than chatGPT,
| but I've given both a shot and much prefer chatGPT.
|
| Is there something I'm missing here? I use chatGPT for a variety
| of things but mainly coding and I feel subjectively that chatGPT
| is still better for the job.
| owenpalmer wrote:
| What languages do you use?
| 1317 wrote:
| I tried out Claude once, I found it was alright but not much
| better than ChatGPT for what I was doing at that point
|
| Then I thought I'd try it again recently, I went onto the site
| and apparently I'm banned. I don't even remember what I did...
| neets wrote:
| How much of that is converted back into AWS credits?
| Mistletoe wrote:
| I'm imagining Gary Oldman in The Professional screaming "EVERY
| ONE".
| yoyohello13 wrote:
| Claude is absolutely incredible. And I don't trust openAI or
| Microsoft so it's nice to have an alternative.
| cauthon wrote:
| Amazon famously more trustworthy
| stingraycharles wrote:
| Google also invested $2B into Anthrophic. Seems like both
| Google and Amazon are providing credits for their cloud, also
| as a hedge against Microsoft / OpenAI becoming too big.
| yoyohello13 wrote:
| If I have to chose between Amazon and Microsoft I'll chose
| the lesser evil. Microsoft owns the entire stack from OS to
| server to language to source control. Anything to weaken
| their hold is a win in my book.
| esperent wrote:
| > chose between Amazon and Microsoft... the lesser evil
|
| A hard question. If you focusing purely on tech, probably
| Microsoft. But overall evil in the world? With their union
| busting and abuse of workers, Amazon, I'd say.
| pkillarjun wrote:
| I will start using Claude the day they stop asking me for my
| mobile number.
| sourcecodeplz wrote:
| I can cross the street and get a new FREE SIM with a number
| from the shop. all i have to do is put some money on it to
| activate it, like $1 ...
| jessriedel wrote:
| Is there an implied valuation? Or not enough details released?
| bg24 wrote:
| AWS is achieving 2 objectives:
|
| 1/ Best-in-class LLM in Bedrock. This could be done w/o the
| partnership as well.
|
| 2/ Evolving Tranium and Inferential as worthy competitors for
| large scale training and inference. They have thousands of large-
| scale customers, and as the adoption grows, the investment will
| pay for itself.
| ulfw wrote:
| How many employees will lose their lifelihood to pay for this
| again?
| ddxv wrote:
| I must be missing it. How is anthropic worth so much when open
| source is closing in so fast? What value will anthropic have if
| competitors can be mostly free?
___________________________________________________________________
(page generated 2024-11-23 23:01 UTC)