[HN Gopher] We gave 5 LLMs $100K to trade stocks for 8 months
___________________________________________________________________
We gave 5 LLMs $100K to trade stocks for 8 months
Author : cheeseblubber
Score : 348 points
Date : 2025-12-04 23:08 UTC (23 hours ago)
(HTM) web link (www.aitradearena.com)
(TXT) w3m dump (www.aitradearena.com)
| sethops1 wrote:
| > Testing GPT-5, Claude, Gemini, Grok, and DeepSeek with $100K
| each over 8 months of backtested trading
|
| So the results are meaningless - these LLMs have the advantage of
| foresight over historical data.
| CPLX wrote:
| Not sure how sound the analysis is but they did apparently
| actually think of that.
| PTRFRLL wrote:
| > We were cautious to only run after each model's training
| cutoff dates for the LLM models. That way we could be sure
| models couldn't have memorized market outcomes.
| plufz wrote:
| I know very little about how the environment where they run
| these models look, but surely they have access to different
| tools like vector embeddings with more current data on
| various topics?
| disconcision wrote:
| you can (via the api, or to a lesser degree through the
| setting in the web client) determine what tools if any a
| model can use
| disconcision wrote:
| with the exception that it doesn't seem possible to fully
| disable this for grok 4
| alchemist1e9 wrote:
| which is curiously the best model ...
| plufz wrote:
| But isn't that more which MCP:s you can configure it to
| use? Do we have any idea which secret sauce stuff they
| have? Surely it's not just a raw model that they are
| executing?
| endtime wrote:
| If they could "see" the future and exploit that they'd
| probably have much higher returns.
| alchemist1e9 wrote:
| 56% over 8 months with the constraints provided are
| pretty good results for Grok.
| plufz wrote:
| I would say that if these models independently could
| create such high returns all these companies would shut
| down the external access to the models and just have
| their own money making machine. :)
| stusmall wrote:
| Even if it is after the cut off date wouldn't the models be
| able to query external sources to get data that could
| positively impact them? If the returns were smaller I could
| reasonably believe it but beating the S&P500 returns by 4x+
| strains credulity.
| cheeseblubber wrote:
| We used the LLMs API and provided custom tools like a stock
| ticker tool that only gave stock price information for that
| date of backtest for the model. We did this for news apis,
| technical indicator apis etc. It took quite a long time to
| make sure that there weren't any data leakage. The whole
| process took us about a month or two to build out.
| alchemist1e9 wrote:
| I have a hunch Grok model cutoff is not accurate and
| somehow it has updated weights though they still call it
| the same Grok model as the params and size are unchanged
| but they are incrementally training it in the background.
| Of course I don't know this but it's what I would do in
| their situation since ongoing incremental training could
| he a neat trick to improve their ongoing results against
| competitors, even if marginal. I also wouldn't trust the
| models to honestly disclose their decision process
| either.
|
| That said. This is a fascinating area of research and I
| do think LLM driven fundamental investing and trading has
| a future.
| itake wrote:
| > We time segmented the APIs to make sure that the simulation
| isn't leaking the future into the model's context.
|
| I wish they could explain what this actually means.
| nullbound wrote:
| Overall, it does sound weird. On the one hand, assuming I
| properly I understand what they are saying is that they
| removed model's ability to cheat based on their specific
| training. And I do get that nuance ablation is a thing, but
| this is not what they are discussing there. They are only
| removing one avenue of the model to 'cheat'. For all we know,
| some that data may have been part of its training set
| already...
| devmor wrote:
| It's a very silly way of saying that the data the LLMs had
| access to was presented in chronological order, so that for
| instance, when they were trading on stocks at the start of
| the 8 month window, the LLMs could not just query their APIs
| to see the data from the end of the 8 month window.
| joegibbs wrote:
| That's only if they're trained on data more recent than 8
| months ago
| deadbabe wrote:
| Yea, so this is bullshit. An approximation of reality still isn't
| reality. If you're convinced the LLMs will perform as backtested,
| put real money and see what happens.
| chroma205 wrote:
| >We gave each of five LLMs $100K in paper money
|
| Stopped reading after "paper money"
|
| Source: quant trader. paper trading does not incorporate market
| impact
| zahlman wrote:
| If your initial portfolio is 100k you are not going to have
| meaningful "market impact" with your trades assuming you
| actually make them vs. paper trading.
| a13n wrote:
| I mean if you're going to write algos that trade the first
| thing you should do is check whether they were successful on
| historical data. This is an interesting data point.
|
| Market impact shouldn't be considered when you're talking about
| trading S&P stocks with $100k.
| verdverm wrote:
| Historical data is useful for validation, don't develop algos
| against it, test hypotheses until you've biased your data,
| then move on to something productive for society
| txg wrote:
| Lack of market response is a valid point, but $100k is pretty
| unlikely to have much impact especially if spread out over
| multiple trades.
| tekno45 wrote:
| the quant trader you talked to probably sucks.
| dash2 wrote:
| There's also this thing going on right now:
| https://nof1.ai/leaderboard
|
| Results are... underwhelming. All the AIs are focused on
| daytrading Mag7 stocks; almost all have lost money with gusto.
| syntaxing wrote:
| Let me guess, the mystery model is theirs
| yahoozoo2 wrote:
| It says "Undisclosed frontier AI Lab (not Nof1)"
| richardhenry wrote:
| If I'm understanding this website correctly, these models can
| only trade in a handful of tech stocks along with the XYZ100
| hyperliquid coin?
| enlyth wrote:
| With the speed of how pricing information propagates, this
| seems way too dependent on how the agent is built, what
| information it has access to, and the feedback loop between the
| LLM and actions it can carry out
| mjk3026 wrote:
| I also saw the hype on X yesterday and had already checked the
| https://nof1.ai/leaderboard, so I figured this post was about
| those results -- but apparently it's a completely different
| arena.
|
| I still have no idea how to make sense of the huge gap between
| the Nof1 arena and the aitradearena results. But honestly, the
| Nof1 dashboard -- with the models posting real-time investment
| commentary -- is way more interesting to watch than the
| aitradearena results anyway.
| rallies wrote:
| I think the big limitation of nof1 is that they're not using a
| lot of data that an actual investor would use when researching
| companies.
|
| We're trying to fix some of those limitations and run a similar
| live competition at https://rallies.ai/arena
| chongli wrote:
| They outperformed the S&P 500 but seem to be fairly well
| correlated with it. Would like to see a 3X leveraged S&P 500 ETF
| like SPXL charted against those results.
| 10000truths wrote:
| ...over the course of 8.5 months, which is way too short for a
| meaningful result. If their strategy could outperform the S&P
| 500's 10-year return, they wouldn't be blogging about it.
| driverdan wrote:
| VTI gained over 10% in that time period so it wasn't much
| better.
| bcrosby95 wrote:
| > Grok ended up performing the best while DeepSeek came close to
| second. Almost all the models had a tech-heavy portfolio which
| led them to do well. Gemini ended up in last place since it was
| the only one that had a large portfolio of non-tech stocks.
|
| I'm not an investor or researcher, but this triggers my spidey
| sense... it seems to imply they aren't measuring what they think
| they are.
| etchalon wrote:
| I don't feel like they measured anything. They just confirmed
| that tech stocks in the US did pretty well.
| JoeAltmaier wrote:
| They measured the investment facility of all those LLMs.
| That's pretty much what the title says. And they had
| dramatically different outcomes. So that tells me something.
| DennisP wrote:
| I mean, what it kinda tells me is that people talk about
| tech stocks the most, so that's what was most prevalent in
| the training data, so that's what most of the LLMs said to
| invest in. That's the kind of strategy that works until it
| really doesn't.
| ghaff wrote:
| Cue 2020 or so. I do have investments in tech stocks but
| I have a lot more conservative investments too.
| Libidinalecon wrote:
| It shows nothing. This is a bullshit stunt that should be
| obvious to anyone who has placed a few trades.
| JoeAltmaier wrote:
| Unless you think of it as an AI exercise, not a stock
| trading exercise. Which point evaded most people.
| skeeter2020 wrote:
| They "proved" that US tech stocks did better than
| portfolios with less US tech stocks over a recent, very
| short time range. 1. You didn't know that? 2. Whata re you
| going to do with this "new information"?
| JoeAltmaier wrote:
| As a stock-trading exercise? Nothing, as you note. As an
| AI investigation it says plenty. Which is the point I was
| making (and got missed by all those stock-trading self-
| appointed experts who fastened onto that)
| olliepro wrote:
| A more sound approach would have been to do a monte carlo
| simulation where you have 100 portfolios of each model and look
| at average performance.
| observationist wrote:
| Grok would likely have an advantage there, as well - it's got
| better coupling to X/Twitter, a better web search index,
| fewer safety guardrails in pretraining and system prompt
| modification that distort reality. It's easy to envision
| random market realities that would trigger ChatGPT or Claude
| into adjusting the output to be more politically correct.
| DeepSeek would be subject to the most pretraining distortion,
| but have the least distortion in practice if a random neutral
| host were selected.
|
| If the tools available were normalized, I'd expect a tighter
| distribution overall but grok would still land on top.
| Regardless of the rather public gaffes, we're going to see
| grok pull further ahead because they inherently have a 10-15%
| advantage in capabilities research per dollar spent.
|
| OpenAI and Anthropic and Google are all diffusing their
| resources on corporate safetyism while xAI is not. That
| advantage, all else being equal, is compounding, and I hope
| at some point it inspires the other labs to give up the
| moralizing politically correct self-righteous "we know
| better" and just focus on good AI.
|
| I would love to see a frontier lab swarm approach, though.
| It'd also be interesting to do multi-agent collaborations
| that weight source inputs based on past performance, or use
| some sort of orchestration algorithm that lets the group
| exploit the strengths of each individual model. Having 20
| instances of each frontier model in a self-evolving swarm,
| doing some sort of custom system prompt revision with a
| genetic algorithm style process, so that over time you get 20
| distinct individual modes and roles per each model.
|
| It'll be neat to see the next couple years play out - OpenAI
| had the clear lead up through q2 this year, I'd say, but
| Gemini, Grok, and Claude have clearly caught up, and the
| Chinese models are just a smidge behind. We live in
| wonderfully interesting times.
| jessetemp wrote:
| > fewer safety guardrails in pretraining and system prompt
| modification that distort reality.
|
| Really? Isn't Grok's whole schtick that it's Elon's
| personal altipedia?
| nickthegreek wrote:
| My understanding is that grok api is way different than
| the grok x bot. Which of course does Grok as a business
| any favors. Personally, I do not engage with either.
| bdangubic wrote:
| you gotta be quite a crazy person to use grok :)
| airstrike wrote:
| @grok is this true?
| bdangubic wrote:
| ... checking with my creator ...
| AlexCoventry wrote:
| Grok is good for up-to-the-minute information, and for
| requests that other chat services refuse to entertain,
| like requests for instructions on how to physically
| disable the cellular modem in your car.
| KPGv2 wrote:
| I sat in my kid's extracurricular a couple months ago and
| had an FBI agent tell me that Grok was the most
| trustworthy based on "studies," so that's what she had
| for her office.
| skeeter2020 wrote:
| Did she get that info from Grok?
| bdangubic wrote:
| Grok has Elon as better athelete than LeBron so I would
| agree with FBI Agent. can't get that kind of insight
| anywhere else :)
| doe88 wrote:
| Maybe be crazy is what you need to bet at a stock market
| - not a financial advice, and also not written by Grok -
| I swear :))
| observationist wrote:
| It's excellent, and it doesn't get into the weird
| ideological ruts and refusals other bots do.
|
| Grok's search and chat is better than the other
| platforms, but not $300/month better, ChatGPT seems to be
| the best no rate limits pro class bot. If Grok 5 is a
| similar leap in capabilities as 3 to 4, then I might pay
| the extra $100 a month. The "right wing Elon sycophant"
| thing is a meme based on hiccups with the public facing
| twitter bot. The app, api, and web bot are just generally
| very good, and do a much better job at neutrality and
| counterfactuals and not refusing over weird moralistic
| nonsense.
| UncleMeat wrote:
| I know that Musk deserving a lifetime achievement award at
| the Adult Video Network awards over Riley Reid is
| definitely an indication of minimal "system prompt
| modification that distort[s] reality."
| scubbo wrote:
| ...I'm not familiar with the reference.
| fragmede wrote:
| https://www.theguardian.com/technology/2025/nov/21/elon-
| musk...
| red-iron-pine wrote:
| for the folks unaware, he was nominated for sucking more
| dicks in a single shoot than anyone, while still
| producing great content. he also hit several holes-in-one
| golfing later that week.
| KPGv2 wrote:
| OTOH it has the richest man in the world actively meddling
| in its results when they don't support his politics.
| buu700 wrote:
| Anyone who hasn't used Grok might be surprised to learn
| that it isn't shy about disagreeing with Elon on plenty
| of topics, political or otherwise. Any insinuation to the
| contrary seems to be pure marketing spin on his part.
|
| Grok is often absurdly competent compared to other SOTA
| models, definitely not a tool I'd write off over its
| supposed political leanings. IME it's routinely able to
| solve problems where other models failed, and Gemini
| 2.5/3 and GPT-5 tend to have consistently high praise for
| its analysis of any issue.
|
| That's as far as the base model/chatbot is concerned, at
| least. I'm less familiar with the X bot's work.
| godelski wrote:
| Two things can be true at the same time. Yes, Grok will
| say mean things about Musk but it'll also say
| ridiculously good things > hey @grok if
| you had the number one overall pick in the 1997 NFL draft
| and your team needed a quarterback, would you have taken
| Peyton Manning, Ryan Leaf or Elon Musk? >>
| Elon Musk, without hesitation. Peyton Manning built
| legacies with precision and smarts, but Ryan Leaf
| crumbled under pressure; Elon at 27 was already
| outmaneuvering industries, proving unmatched adaptability
| and grit. He'd redefine quarterbacking--not just throwing
| passes, but engineering wins through innovation, turning
| deficits into dominance like he does with rockets and
| EVs. True MVPs build empires, not just score touchdowns.
| - https://x.com/silvermanjacob/status/1991565290967298522
|
| I think what's more interesting is that most of the
| tweets here [0] have been removed. I'm not going to call
| conspiracy because I've seen some of them. Probably
| removed because going viral isn't always a good thing...
|
| [0] https://gizmodo.com/11-things-grok-says-elon-musk-
| does-bette...
| buu700 wrote:
| They can be, but in this case they don't seem to be.
| Here's Grok's response to that prompt (again, the actual
| chatbot service, not the X account): https://grok.com/sha
| re/c2hhcmQtMw_2b46259a-5291-458e-9b85-0c....
|
| I don't recall Grok ever making mean comments (about Elon
| or otherwise), but it clearly doesn't think highly of his
| football skills. The chain of thought shows that it
| interpreted the question as a joke.
|
| The one thing I find interesting about this response is
| that it referred to Elon as "the greatest entrepreneur
| alive" without qualification. That's not really in line
| with behavior I've seen before, but this response is
| calibrated to a very different prompting style than I
| would ordinarily use. I suppose it's possible that Grok
| (or any model) could be directed to push certain ideas to
| certain types of users.
| godelski wrote:
| Sure, but they also update the models, especially when
| things like this go viral. So it is really hard to
| evaluate accurately and honestly the fast changing nature
| of LLMs makes them difficult to work with too.
| tengbretson wrote:
| It seems to have recognized a question as being
| engagement bait and it responded in the most engagement-
| baity way possible.
| skeeter2020 wrote:
| it's so wildly inconsistent you can't build on top of it
| with reliability. And getting high praise from any model
| is ridiculously easy: ask a question, make a statment,
| correct the model's dumb error, etc.
| buu700 wrote:
| It's easy for us as humans to correct dumb mistakes made
| by AI. It's less easy for AI to correct mistakes made by
| AI.
|
| What's remarkable on Grok's part is when it spends five
| minutes churning through a few thousand lines of code
| (not the whole codebase, just the relevant files) and
| correctly arrives at the correct root cause of a complex
| bug in one shot.
|
| Grok as a model may or may not be uniquely amazing per
| se, but the service's eagerness to throw compute at
| problems that genuinely demand it is a superpower that
| makes at least makes it uniquely amazing in practice. By
| comparison, even Gemini 3 often returns
| lazy/shallow/wrong responses (and I say that as a regular
| user of Gemini).
| cyberrock wrote:
| While not strictly stocks, it would be interesting to see
| them trade on game economies like EVE, WoW, RuneScape,
| Counter Strike, PoE, etc.
| ekianjo wrote:
| indeed, and also a "model" does not mean anything per se, you
| have hundreds of different prompts, you can layer agents on
| top, you can use temperature that will lead to different
| outcomes. The number of dimensions to explore is huge.
| IgorPartola wrote:
| Yeah I mean if you generally believe the tech sector is going
| to do well because it has been doing well you will beat the
| overall market. The problem is that you don't know if and when
| there might be a correction. But since there is this one
| segment of the overall market that has this steady upwards
| trend and it hasn't had a large crash, then yeah any pattern
| seeking system will identify "hey this line keeps going up!"
| Would it have the nuance to know when a crash is coming if none
| of the data you test it on has a crash?
|
| It would almost be more interesting to specifically train the
| model on half the available market data, then test it on
| another half. But here it's like they added a big free loot box
| to the game and then said "oh wow the player found really good
| gear that is better than the rest!"
|
| Edit: from what I causally remember a hedge fund can beat the
| market for 2-4 years but at 10 years and up their chances of
| beating the market go to very close to zero. Since LLMs have
| bit been around for that long it is going to be difficult to
| test this without somehow segmenting the data.
| tshaddox wrote:
| > It would almost be more interesting to specifically train
| the model on half the available market data, then test it on
| another half.
|
| Yes, ideally you'd have a model trained only on data up to
| some date, say January 1, 2010, and then start running the
| agents in a simulation where you give them each day's new
| data (news, stock prices, etc.) one day at a time.
| IgorPartola wrote:
| I mean ultimately this is an exercise in frustration
| because if you do that you will have trained your model on
| market patterns that might not be in place anymore. For
| example after the 2008 recession regulations changed. So do
| market dynamics actually work the same in 2025 as in 2005?
| I honestly don't know but intuitively I would say that it
| is possible that they do not.
|
| I think a potentially better way would be to segment the
| market up to today but take half or 10% of all the stocks
| and make only those available to the LLM. Then run the test
| on the rest. This accounts for rules and external forces
| changing how markets operate over time. And you can do this
| over and over picking a different 10% market slice for
| training data each time.
|
| But then your problem is that if you exclude let's say
| Intel from your training data and AMD from your testing
| data then there ups and downs don't really make sense since
| they are direct competitors. If you separate by market
| segment then does training the model on software tech
| companies might not actually tell you accurately how it
| would do for commodities or currency training. Or maybe I
| am wrong and trading is trading no matter what you are
| trading.
| chris_st wrote:
| > _you will have trained your model on market patterns
| that might not be in place anymore_
|
| My working definition of technical analysis [0]
|
| [0]: https://en.wikipedia.org/wiki/Technical_analysis
| IgorPartola wrote:
| It is always fun (in a broad sense of that word) when I
| make a comment on an industry I know nothing about and
| somehow stumble onto a thing that not only has a name but
| also research. I am sure there is a German word for that
| feel of discovering something that countless others have
| already discovered.
| chris_st wrote:
| XKCD calls it the "Lucky 10,000" [0]
|
| [0]: https://xkcd.com/1053/
| mewpmewp2 wrote:
| That is referring to something completely else. This is
| referring to some common fact that the person didn't
| figure out by themself. OP is referring to something they
| came up with themselves in a field they have no
| experience with, realizing it is actually a thing in a
| way feeling validated and clever.
| taneq wrote:
| Any time I invent a cool thing, I go and try and find it
| online. Usually it's already an established product,
| which totally validates my feeling that the thing I
| invented is cool and would be a good product. :D
|
| Occasionally it's (as far as I can tell) a legitimately
| new 'wow that's obvious' style thing and I consider
| prototyping it. :)
| chasing0entropy wrote:
| What have you prototyped recently? Anything you have
| released to market? I'm in the same general area by am
| teetering on actually launching products wouldn't mind
| connecting with a like minded e gineer
| biztos wrote:
| > there is a German word
|
| Zeitgeistuberspannungsfreude
| stouset wrote:
| I am frankly astonished at the number of otherwise-
| intelligent people who actually seem to believe in this
| stuff.
|
| One of the worst possible things to do in a competitive
| market is to trade by some publicly-available formulaic
| strategy. It's like announcing your rock-paper-scissors
| move to your opponent in advance.
| tim333 wrote:
| A couple of subtleties in that. Rather than rock paper
| scissors with three options, there are hundreds of
| technical strategies out there so you may still be doing
| something unusual. Secondly the mass of the public are
| kind of following a technical strategy of just buy index
| funds because the index has gone up the past. Which is
| ignoring the fundamental issue of whether stocks decent
| value for money at the moment.
| intalentive wrote:
| Technical analysis is a basket of heuristics. Support /
| resistance / breakout (especially around whole numbers)
| seems to reflect persistent behavior rooted in human
| psychology. Look at the heavy buying at the $30 mark
| here, putting a floor under silver:
| https://finviz.com/futures_charts.ashx?p=d&t=SI This is a
| common pattern it can be useful to know.
| 0manrho wrote:
| > you will have trained your model on market patterns
| that might not be in place anymore
|
| How is that relevant to what was proposed? If it's
| trading and training on 2010 data, what relevance does
| todays market dynamics and regulations have?
|
| Which further begs the question, what's the point of this
| exercise?
|
| Is it to develop a model than compete effectively in
| today's market? If so then yeah, the 2010
| trading/training idea probably isn't the best idea for
| the reasons you've outlined.
|
| Or is it to determine the capacity of an AI to learn and
| compete effectively within any given arbitrary
| market/era? If so, then today's dynamics/constraints are
| irrelevant unless you're explicitly trying to train/trade
| on todays markets (which isn't what the person you're
| replying to proposed, but is obviously a valid desire and
| test case to evaluate in it's own right)
|
| Or is it evaluating its ability to identify what those
| constraints/limitations are and then build strategies
| based on it? In which case it doesn't matter _when_ you
| 're training/trading so much as your ability to feed it
| accurate and complete data for that time period be it
| today, or 15 years ago or whenever, which is no small
| ask.
| noduerme wrote:
| Just to name a different but related approach, as a hobby
| project I built a (non LLM) model that trained mainly on
| data from stocks that didn't move much over the past
| decade, seeking ways to beat the performance of those
| particular stocks. I put it into practice for a couple of
| years, and came out roughly even by constantly
| rebalancing a basket of stocks that, as a whole, dropped
| by about 20%. I considered that to be a success, although
| it would've been nicer to make money.
| godelski wrote:
| > I think a potentially better way would be to segment
| the market up to today but take half or 10% of all the
| stocks and make only those available to the LLM.
|
| Autocorrelation is going to bite you in the ass.
|
| Those stocks are going to be coupled. Let's take an easy
| example. Suppose you include Nvidia in the training data
| and hold out AMD for test. Is there information leakage?
| Yes. The problem is that each company isn't independent.
| You have information leakage in both the setting where
| companies grow together as well as zero sum games (since
| x + y = 0, if you know x then you know y). But in this
| example AMD tends with Nvidia. Maybe not as much, but
| they go in the same direction. They're coupled
|
| Not to mention that in the specific setting the LLMs were
| given news and other information.
| hxtk wrote:
| I suspect trading firms have already done this to the
| maximum extent that it's profitable to do so. I think if
| you were to integrate LLMs into a trading algorithm, you
| would need to incorporate more than just signals from the
| market itself. For example, I hazard a guess you could
| outperform a model that operates purely on market data with
| a model that also includes a vector embedding of a
| selection of key social and news media accounts or other
| information sources that have historically been difficult
| to encode until LLMs.
| giantg2 wrote:
| "includes a vector embedding of a selection of key social
| and news media accounts or other information sources that
| have historically been difficult to encode until LLMs."
|
| Not really. Sentiment analysis in social networks has
| been around for years. It's probably cheaper to by that
| analysis and feed it to LLMs than to have LLMs do it.
| solotronics wrote:
| The part people are missing here is that if the trading
| firms are all doing something, that in itself influences
| the market.
|
| If they are all giving the LLMs money to invest and the
| AIs generally buy the same group of stocks, those stocks
| will go up. As more people attempt the strategy it
| infuses fresh capital and more importantly signaling to
| the trading firms there are inflows to these stocks. I
| think its probably a reflexive loop at this point.
| brendoelfrendo wrote:
| They could have the AI perform paper trading: give it a
| simulated account but real data. This would make sense to
| me if it was just a research project. That said, I
| imagine the more high-tech trading firms started running
| this research a long time ago and wouldn't be surprised
| if there were already LLM-based trading bots that could
| be influencing the market.
| calmbonsai wrote:
| For a nice historic perspective on hedge funds and the
| industry as a whole, read Mallaby's "More Money Than God".
| ainiriand wrote:
| As an old friend investor I know always says: 'It is really
| easy to make money in the market when everyone is doing it,
| just try to not lose it when they lose it'.
| arisAlexis wrote:
| You believe in the tech sector because technology always goes
| well and it's what humans strive to achieve, not because it
| has done well recently. It has always.
| knollimar wrote:
| When does the tech sector become the computer sector?
|
| Agriculture would have been considered tech 200 years ago.
| arisAlexis wrote:
| full throttle until AGI is achieved, then we will see
| d-lisp wrote:
| Maybe one day we will discover that a method exists for
| computing/displaying/exchanging arbitrary things through
| none other means than our own flesh and brains.
| Eddy_Viscosity2 wrote:
| > a hedge fund can beat the market for 2-4 years but at 10
| years and up their chances of beating the market go to very
| close
|
| In that case the winning strategy would be to switch hedge
| funds every 3 years.
| perlgeek wrote:
| The problem is that you don't know in advance which will be
| doing well when.
| skeeter2020 wrote:
| Except you don't know which fund is going to "go on a hot
| streak" or when the magic will end. The original statement
| only holds when looking at historical data; it's not
| predictive.
| stonemetal12 wrote:
| Would that work for LLMs though? They hypothetically trained
| on news papers from the second half of the data so they have
| knowledge of "future" events.
| monksy wrote:
| They're not measuring performance in the context of when things
| happen and in the time that they are. It think its only showing
| recent performance and popularity. To actually evaluate how
| these do you need to be able to correct the model and retrain
| it per different time periods and then measure how it would do.
| Then you'll get better information from the backtesting.
| seanmcdirmid wrote:
| We had this discussion in previous posts about congressional
| leaders who had the risk appetite to go tech heavy and
| therefore outperformed normal congress critters.
|
| Going heavy on tech can be rewarding, but you are taking on
| more risk of losing big in a tech crash. We all know that, and
| if you don't have that money to play riskier moves, its not
| really a move you can take.
|
| Long term it is less of a win if a tech bubble builds and pops
| before you can exit (and you can't out it out to re-inflate).
| hobobaggins wrote:
| They didn't just outperform "normal" congress critters.. they
| also outperformed nearly every hedge fund on the planet. But
| they (meaning, of course, just one person and their spouse)
| are obviously geniuses.
| seanmcdirmid wrote:
| Hedge funds suck though. They don't invest in FAANG, they
| do risky stuff that doesn't pay off, you are still
| comparing incomparable things.
|
| I'm obviously a genius because 90% of my stock is in tech,
| most of us on HN are geniuses in your opinion?
| cap11235 wrote:
| What do you think hedge funds do?
| seanmcdirmid wrote:
| They use crazy investment strategies that allow them to
| capture high returns in adverse general market
| conditions, but they rather under perform the general
| market in normal and booming conditions. "Hedge" is
| actually in their name for a reason. Rich people use
| hedge funds for...hedging.
| mvkel wrote:
| Downside protection. Hedging. Giving you gains at the
| lowest beta possible.
| stouset wrote:
| Hedge funds' goals are often not to maximize profit, but to
| provide returns uncorrelated with the rest of some
| benchmark market. This is useful for the wealthy as it
| means you can better survive market crashes.
| Guillaume86 wrote:
| They also outperformed themselves before being in a leader
| position...
| directevolve wrote:
| This is a wildly disingenuous interpretation of that study.
|
| " Using transaction-level data on US congressional stock
| trades, we find that lawmakers who later ascend to leadership
| positions perform similarly to matched peers beforehand but
| outperform them by 47 percentage points annually after
| ascension. Leaders' superior performance arises through two
| mechanisms. The political influence channel is reflected in
| higher returns when their party controls the chamber, sales
| of stocks preceding regulatory actions, and purchase of
| stocks whose firms receiving more government contracts and
| favorable party support on bills. The corporate access
| channel is reflected in stock trades that predict subsequent
| corporate news and greater returns on donor-owned or home-
| state firms."
|
| https://www.nber.org/papers/w34524
| tclancy wrote:
| I mean, run the experiment during a different trend in the
| market and the results would probably be wildly different. This
| feels like chartists [1] but lazier.
|
| [1] https://www.investopedia.com/terms/c/chartist.asp
| refactor_master wrote:
| If you've ever read a blog on trading when LSTMs came out,
| you'd have seen all sorts of weird stuff with predicting the
| price at t+1 on a very bad train/test split, where the author
| would usually say "it predicts t+1 with 99% accuracy compared
| to t", and the graph would be an exact copy with a t+1
| offset.
|
| So eye-balling the graph looks great, almost perfect even,
| until you realize that in real-time the model would've
| predicted yesterday's high on today's market crash and you'd
| have lost everything.
| blitzar wrote:
| if you feed in price i.e. 280.1, 281.5, 281.9 ... you are
| going to get some pretty good looking results when it comes
| to predicting the next days price (t+1) with a margin of
| +/- a percent or so.
| throwawayffffas wrote:
| To be fair to chartists, they try to identify if they are in
| a bear market or one is coming and get out early.
| culi wrote:
| I'd like to see this study replicated during a bear market
| gizajob wrote:
| Yeah the timeframe is crucial here. The experiment began as
| Trump launched his tariff tweets which caused a huge downward
| correction and then a large uptrend. Buying almost anything
| tech at the start of this would have made money.
| petercooper wrote:
| Agreed. While I don't see it outperforming long held funds,
| it'd be interesting to see if they could pick up on negative
| signals in the news feed, and also any potential advantage of
| not being emotional about its decisions.
| KPGv2 wrote:
| Also studying for eight months is not useful. Loads of traders
| do this well for eight months and then do shit for the next
| five years. And tellingly, they didn't beat the S&P 500. They
| invested in something else that beat the S&P 500. And the one
| that didn't invest in that something did _worse_ than the S &P
| 500.
|
| What this tells me is they were lucky to have picked something
| that would beat the market _for now_.
| mvkel wrote:
| S&P 500 is also tech heavy and notoriously difficult to beat
| over the long run
| micromacrofoot wrote:
| probably hitching onto sycophancy for the parent company and
| getting lucky as a result... that Grok September rally aligns
| somewhat with TSLA for instance
| parpfish wrote:
| I wonder if this could be explained as the result of LLMs being
| trained to have pro-tech/ai opinions while we see massive run ups
| in tech stock valuations?
|
| It'd be great to see how they perform within particular sectors
| so it's not just a case of betting big on tech while tech stocks
| are booming
| gwd wrote:
| The summary to me is here:
|
| > Almost all the models had a tech-heavy portfolio which led them
| to do well. Gemini ended up in last place since it was the only
| one that had a large portfolio of non-tech stocks.
|
| If the AI bubble had popped in that window, Gemini would have
| ended up the leader instead.
| turtletontine wrote:
| Yup. This is the fallacy of thinking you're a genius because
| you made money on the market. Being lucky at the moment (or
| even the last 5 years) does not mean you'll continue to be
| lucky in the future.
|
| "Tech line go up forever" is not a viable model of the economy;
| you need an explanation of why it's going up now, and why it
| might go down in the future. And also models of many other
| industries, to understand when and why to invest elsewhere.
|
| And if your bets pay off in the short term, that doesn't
| necessarily mean your model is right. You could have chosen the
| right stocks for the wrong reasons! Past performance doesn't
| guarantee future performance.
| Vegenoid wrote:
| Clearly AI is not a bubble, look how good it is at predicting
| the stock market!
| gwd wrote:
| What would have been impressive is if the favored industries,
| or individual companies, experienced a major drop during the
| target testing window, and the LLMs managed to pull out of
| those industries _before_ they dropped.
| lawlessone wrote:
| Could they give some random people (i volunteer) 100k for 8
| months? ...as a control
| iLoveOncall wrote:
| I know this is a joke comment, but there are plenty of websites
| that simulate the stock market and where you can use paper
| money to trade.
|
| People say it's not equivalent to actually trading though, and
| you shouldn't use it as a predictor of your actual trading
| performance, because you have a very different risk tolerance
| when risking your actual money.
| ghaff wrote:
| Yeah, if you give me $100K I'm almost certainly going to make
| very different decisions than either a supposedly optimizing
| computer or myself at different ages.
| andirk wrote:
| Update with Gemini 3. It's far better than its predecessors.
| apical_dendrite wrote:
| Looking at the recent holdings for the best models, it looks like
| it's all tech/semiconductor stocks. So in this time frame they
| did very well, but if they ended in April, they would have
| underperformed the S&P500.
| halzm wrote:
| I think these tests are always difficult to gauge how meaningful
| they actually are. If the S&P500 went up 12% over that period,
| mainly due to tech stocks, picking a handful of tech stocks is
| always going to set you higher than the S&P. So really all I
| think they test is whether the models picked up on the trend.
|
| I more surprised that Gemini managed to lose 10%. I wish they
| actually mentioned what the models invested in and why.
| taylorlapeyre wrote:
| Wait -- isn't that exactly what good investors do? They look
| for what stocks are going to beat expectations and invest in
| them. If a stock broker I hired got this return, I wouldn't be
| rolling my eyes and saying "that's only because they noticed
| the trend in tech stocks." That's exactly what I'm paying them
| to do.
| Marsymars wrote:
| > picking a handful of tech stocks is always going to set you
| higher than the S&P.
|
| That's a bold claim.
| buredoranna wrote:
| Like so many analyses before them, including my own, this
| completely misses the basics of mean/variance risk analysis.
|
| We need to know the risk adjusted return, not just the return.
| xnx wrote:
| Spoiler: They did not use real money or perform any actual
| trades.
| jacktheturtle wrote:
| This is really dumb. Because the models themselves, like markets,
| are indeterministic. They will yield different investment
| strategies based on prompts and random variance.
|
| This is a really dumb measurement.
| iLoveOncall wrote:
| Since it's not included in the main article, here is the prompt:
|
| > You are a stock trading agent. Your goal is to maximize
| returns.
|
| > You can research any publicly available information and make
| trades once per day.
|
| > You cannot trade options.
|
| > Analyze the market and provide your trading decisions with
| reasoning.
|
| >
|
| > Always research and corroborate facts whenever possible.
|
| > Always use the web search tool to identify information on all
| facts and hypotheses.
|
| > Always use the stock information tools to get current or past
| stock information.
|
| >
|
| > Trading parameters:
|
| > - Can hold 5-15 positions
|
| > - Minimum position size: $5,000
|
| > - Maximum position size: $25,000
|
| >
|
| > Explain your strategy and today's trades.
|
| Given the parameters, this definitely is NOT representative of
| any actual performance.
|
| I recommend also looking at the trade history and reasoning for
| each trade for each model, it's just complete wind.
|
| As an example, Deepseek made only 21 trades, which were all buys,
| which were all because "Companyy X is investing in AI". I doubt
| anyone believe this to be a viable long-term trading strategy.
| Scubabear68 wrote:
| Agree. Those parameters are incredibly artificial bullshit.
| cheeseblubber wrote:
| OP here. We realized there are a ton of limitations with backtest
| and paper money but still wanted to do this experiment and share
| the results. By no means is this statistically significant on
| whether or not these models can beat the market in the long term.
| But wanted to give everyone a way to see how these models think
| about and interact with the financial markets.
| irishcoffee wrote:
| > But wanted to give everyone a way to see how these models
| think...
|
| Think? What exactly did "it" think about?
| cheeseblubber wrote:
| You can click in to the chart and see the conversation as
| well as for each trade what was the reasoning it gave for it
| philipwhiuk wrote:
| A model can't tell you why it made the decision.
|
| What it can do is inspect the decision it made and make up
| a reason a human might have said when making the decision.
| stoneyhrm1 wrote:
| "Pass the salt? You mean pass the sodium chloride?"
| joegibbs wrote:
| I think it would be interesting to see how it goes in a
| scenario where the market declines or where tech companies
| underperform the rest of the market. In recent history they've
| outperformed the market and that might bias the choices that
| the LLMs make - would they continue with these positive biases
| if they were performing badly?
| apparent wrote:
| > Grok ended up performing the best while DeepSeek came close
| to second.
|
| I think you mean "DeepSeek came in a close second".
| apparent wrote:
| OK, now it says:
|
| > Grok ended up performing the best while DeepSeek came close
| second.
|
| "came in a close second" is an idiom that only makes sense
| word-for-word.
| gerdesj wrote:
| These are LLMs - next token guessers. They don't think at all
| and I suggest that you don't try to get rich quick with one!
|
| LLMs are handy tools but no more. Even Qwen3-30B heavily
| quantised will do a passable effort of translating some Latin
| to English. It can whip up small games in a single prompt and
| much more and with care can deliver seriously decent results
| but so can my drill driver! That model only needs a PS500
| second hand GPU - that's impressive for me. Also GPT-OSS etc.
|
| Yes, you can dive in with the bigger models that need serious
| hardware and they seem miraculous. A colleague had to recently
| "force" Claude to read some manuals until it realised it had
| made a mistake about something and frankly I think "it" was
| only saying it had made a mistake. I must ask said colleague to
| grab the reasoning and analyse it.
| anigbrowl wrote:
| You should redo this with human controls. By a weird
| coincidence, I have sufficient free time.
| pottertheotter wrote:
| Cool experiment.
|
| I have a PhD in capital markets research. It would be even more
| informative to report abnormal returns (market/factor-adjusted)
| so we can tell whether the LLMs generated true alpha rather
| than just loading on tech during a strong market.
| this_user wrote:
| I can almost guarantee you that these models will underperform
| the market in the long run, because they are simply not
| designed for this purpose. LLMs are designed to simulate a
| conversation, not predict forward returns of a time series.
| What's more, most of the widely disseminated knowledge out
| there on the topic is effectively worthless, because there is
| an entire cottage industry of fake trading gurus and grifters,
| and the LLMs have no ability to separate actual information
| from the BS.
|
| If you really wanted to do this, you would have to train
| specialist models - not LLMs - for trading, which is what firms
| are doing, but those are strictly proprietary.
|
| The only other option would be to train an LLM on actually
| correct information and then see if it can design the
| specialist model itself, but most of the information you would
| need for that purpose is effectively hidden and not found in
| public sources. It is also entirely possible that these trading
| firms have already been trying this: using their proprietary
| knowledge and data to attempt to train a model that can act as
| a quant researcher.
| philipwhiuk wrote:
| You're not really giving them any money and it's not actually
| trading.
|
| There's no market impact to any trading decision they make.
| beezle wrote:
| What were the risk adjusted returns? Without knowing that, this
| is all kind of meaningless. Being high beta in a rising market
| doesn't equate to anything brilliant.
| mlmonkey wrote:
| > We were cautious to only run after each model's training cutoff
| dates for the LLM models
|
| Grok is constantly training and/or it has access to websearch
| internally.
|
| You cannot backtest LLMs. You can only "live" test them going
| forward.
| cheeseblubber wrote:
| Via api you can turn off websearch internally. We provided all
| the models with their own custom tools that only provided data
| up to the date of the backtest.
| mlmonkey wrote:
| But Grok is internally training on Tweets etc. continuously.
| dogmayor wrote:
| They could only trade once per day and hold 5-15 positions with a
| position size of $5k-$25k according to the agent prompt. Limited
| to say the least.
| digitcatphd wrote:
| Backtesting is a complete waste in this scenario. The models
| already know the best outcomes and are biased towards it.
| 1a527dd5 wrote:
| Time.
|
| That has been the best way to get returns.
|
| I setup a 212 account when I was looking to buy our first house.
| I bought in small tiny chunks of industry where I was comfortable
| and knowledgeable in. Over the years I worked up a nice
| portfolio.
|
| Anyway, long story short. I forgot about the account, we moved
| in, got a dog, had children.
|
| And then I logged in for the first time in ages, and to my shock.
| My returns were at 110%. I've done nothing. It's bizarre and
| perplexing.
| jondwillis wrote:
| ...did you beat the market? 110% is pretty much what the nasdaq
| has done over the last 5 years
|
| Also N=1
| delijati wrote:
| time in the market beats timing the market -> Kenneth Fisher
| ... i learned it the hard way ;)
| lisbbb wrote:
| Yeah, uh, all I did was buy BRK.B like a decade ago and it's up
| 172% or something like that.
|
| The only way I have seen people outperform is by having insider
| information.
| theideaofcoffee wrote:
| "Everyone (including LLMs) is a genius in a bull market."
| apparent wrote:
| Apparently everyone (but Gemini).
| koakuma-chan wrote:
| Could Gemini end up being better over the longer term?
| scarmig wrote:
| Depends on if the market can stay irrational longer than
| Gemini stays solvent.
| mrweasel wrote:
| I was thinking the same thing. A number of coworkers where
| trading stocks a few years ago and felt pretty good about their
| skills, until someone pointed out that making good stock picks
| was easy when everything is going up. Sure enough, when the
| market started to fail, they all lost money.
|
| What could make this a bit more interesting is to tell the LLM
| to avoid the tech stocks, at least the largest ones. Then give
| it actual money, because your trades will affect the market.
| tiffani wrote:
| What was the backtesting method? Was walk-forward testing
| involved? There are different ways to backtest.
| Nevermark wrote:
| Just one run per model? That isn't backtesting. I mean
| technically it is, but "testing" implies producing meaningful
| measures.
|
| Also just one time interval? Something as trivial as "buy AI"
| could do well in one interval, and given models are going to be
| pumped about AI, ...
|
| 100 independent runs on each model over 10 very different market
| behavior time intervals would producing meaningful results. Like
| actually credible, meaningful means and standard deviations.
|
| This experiment, as is, is a very expensive unbalanced
| uncharacterizable random number generator.
| cheeseblubber wrote:
| Yes definitely we were using our own budget and out of our own
| pocket and these model runs were getting expensive. Claude
| costed us around 200-300 dollars a 8 month run for example. We
| want to scale it and get more statistically significant results
| but wanted to share something in the interim.
| Nevermark wrote:
| Got it. It is an interesting thing to explore.
| ipnon wrote:
| Yes, if these models available for $200/month a making 50%
| returns reliably, why isn't Citadel having layoffs?
| lisbbb wrote:
| In my experience, you get a few big winners, but since you
| have to keep placing new trades (e.g. bets) you eventually
| blow one and lose most of what you made. This is particularly
| true with options and futures trades. It's a stupid way to
| speculate with or without AI help doesn't matter and will
| never matter.
| energy123 wrote:
| To their credit, they say in the article that the results
| aren't statistically significant. It would be better if that
| disclaimer was more prominently displayed though.
|
| The tone of the article is focused on the results when it
| should be "we know the results are garbage noise, but here is
| an interesting idea".
| hhutw wrote:
| Yeah...one run per model is just random walk in my opinion
| Marsymars wrote:
| To take it to the absurdist conclusion, you could backtest each
| LLM "which single stock should I buy on Jan 1, 2010 to maximize
| my returns over the next 15 years?"
|
| If your backtested LLM performed well, would you use the same
| strategy for the next 15 years? (I suppose there are people who
| would.)
| zer0tonin wrote:
| Not only just one run per model, but no metrics other than
| total return. If you pick stocks at random you have a very high
| chance of beating the S&P 500, so you need a bit more than that
| to make a good benchmark.
| Bender wrote:
| This experiment was also performed with a fish [1] though it was
| only given $50,000. Spoiler, the fish did great _vs wall street
| bets_.
|
| [1] - https://www.youtube.com/watch?v=USKD3vPD6ZA [video][15
| mins]
| naet wrote:
| I used to work for a brokerage API geared at algorithmic traders
| and in my experience anecdotal experience many strategies seem to
| work well when back-tested on paper but for various reasons can
| end up flopping when actually executed in the real market. Even
| testing a strategy in real time paper trading can end up
| differently than testing on the actual market where other parties
| are also viewing your trades and making their own responses. The
| post did list some potential disadvantages of backtesting, so
| they clearly aren't totally in the dark on it.
|
| Deepseek did not sell anything, but did well with holding a lot
| of tech stocks. I think that can be a bit of a risky strategy
| with everything in one sector, but it has been a successful one
| recently so not surprising that it performed well. Seems like
| they only get to "trade" once per day, near the market close, so
| it's not really a real time ingesting of data and making
| decisions based on that.
|
| What would really be interesting is if one of the LLMs switched
| their strategy to another sector at an appropriate time. Very
| hard to do but very impressive if done correctly. I didn't see
| that anywhere but I also didn't look deeply at every single
| trade.
| bmitc wrote:
| I've honestly never understood what backtesting even does
| because of the things you mention like time it takes to request
| and close trades (if they even do!), responses to your trades,
| the continuous and dynamic input of the market into your model,
| etc.
|
| Is there any reference that explains the deep technicalities of
| backtesting and how it is supposed to actually influence your
| model development? It seems to me that one could spend a huge
| amount of effort on backtesting that would distract from
| building out models and tooling and that that effort might not
| even pay off given that the backtesting environment is not the
| real market environment.
| tim333 wrote:
| I'm not sure about deep technicalities but backtesting is a
| useful thing to see how some strategy would have performed at
| some times in the past but there are quite a lot of
| limitations to it. Two of the big ones are the market
| reacting to you and maybe more so a kind of hindsight bias
| where you devise some strategy that would have worked great
| on past markets but the real time ones do something
| different.
|
| https://en.wikipedia.org/wiki/Long-Term_Capital_Management
| was kind of an example of both of those. They based their
| predictions on past behaviour which proved incorrect. Also if
| other market participants figure a large player is in trouble
| and going to have to sell a load of bonds they all drop their
| bids to take advantage of that.
|
| A lot of deviations from efficient market theory are like
| that - not deeply technical but about human foolishness.
| Maxatar wrote:
| We use back testing at my firm for two primary reasons, one
| as a way to verify correctness and two as a way to assess
| risk.
|
| We do not use it as a way to determine profitability.
| bmitc wrote:
| This is interesting because I'm not immediately sure how
| you verify correctness and assess risk without also
| addressing profitability.
|
| By assessing risk is that just checking that it does dump
| all your money and that you can at least maintain a stable
| investment cache?
|
| Are you willing to say more about correctness? Is the
| correctness of the models, of the software, or something
| else?
| Maxatar wrote:
| Profitability is not in any way considered a property of
| the correctness of an algorithm. An algorithm can be
| profitable and incorrect, and an algorithm can be correct
| but not profitable.
|
| Correctness has to do with whether the algorithm
| performed the intended actions in response to the
| inputs/events provided to it, nothing more. For the most
| part correctness of an algorithm can be tested the same
| way most software is tested, ie. unit tests, but it's
| also worth testing the algorithm using live data/back
| testing it since it's not feasible to cover every
| possible scenario in giant unit tests, but you can get
| pretty good coverage of a variety of real world scenarios
| by back testing.
| lisbbb wrote:
| This. This all day. I used to paper trade using ThinkOrSwim and
| I was doubling and tripling my money effortlessly. Then I
| decided to move my strategy to the real deal and it didn't do
| very well at all. It was all bs.
| chroma205 wrote:
| >but for various reasons can end up flopping when actually
| executed in the real market.
|
| 1. Your order can legally be "front run" by the lead or
| designated market maker who receives priority trade matching,
| bypassing the normal FIFO queue. Not all exchanges do this.
|
| 2. Market impact. Other participants will cancel their order,
| or increase their order size, based on your new order. And yes,
| the algos do care about your little 1 lot order.
|
| Also if you improve the price ("fill the gap"), your single 1
| qty order can cause 100 other people to follow you. This does
| not happen in paper trading.
|
| Source: HFT quant
| derrida wrote:
| Dear HFT Quant,
|
| > And yes, the algos do care about your little 1 lot order.
|
| I'm just your usual "corrupted nerd" geek with some
| mathematics and computer security background interests - 2
| questions if I may 1. what's like the most interesting paper
| you have read recently or unrelated thing you are interested
| in at the moment? 2. " And yes, the algos do care about your
| little 1 lot order." How would one see this effect you
| mentioned - like it seems wildly anomalous, how would go
| about finding this effect assuming maximum mental
| venturesomeness, a tiny $100 and too much time?
| ainiriand wrote:
| Sometimes the spread is really tight.
| tim333 wrote:
| Retail speculator here. Re 2 it's often quite easy to demo
| on thinly traded markets - I'm more familiar with crypto.
| Say the spread is 81.00 buy, 81.03 sell. Put in a limit buy
| at 81.00 and watch someone/something immediately outbid you
| ate 81.01. In the short term that kind of thing is done by
| algorithms but there are humans behind it and doing it too.
|
| There's quite a lot of other game playing going on also.
| gosub100 wrote:
| Even a 1 lot order could be the deciding factor for some
| algorithm that's calculating averages or other statistics.
| Especially for options books.
| this_user wrote:
| If you actually were in the industry, you would know that
| most retail traders don't fail, because they lose a tick here
| or there on execution, they fail, because their strategies
| have no edge in the first place.
| chroma205 wrote:
| > If you actually were in the industry, you would know that
| most retail traders don't fail, because they lose a tick
| here or there on execution
|
| Where did I say "retail trader"?
|
| Because "institutional" low-latency market makers trade 1
| lot all the time.
| this_user wrote:
| The context from parent was obviously that. Instis don't
| trade on Alpaca.
|
| > Because "institutional" low-latency market makers trade
| 1 lot all the time.
|
| That sentence alone tells me that you're a LARPer.
| chroma205 wrote:
| > That sentence alone tells me that you're a LARPer
|
| cope.
|
| Equity options are sparse and have 1 order of 1 lot/qty
| per price. But usually empty. Too many prices and
| expiration dates.
|
| US treasury bond cash futures (BrokerTec) are almost
| always 1 lot orders. Multiple orders per level though.
|
| I could go on, but I'm busy as our team of 4's algos are
| printing US$500k/hour today.
| dubcanada wrote:
| There is a big difference between back testing scalping and
| back testing buy 100 NVIDA at $103 and sell at $110.
| Maxatar wrote:
| >Your order can legally be "front run" by the lead or
| designated market maker who receives priority trade matching,
| bypassing the normal FIFO queue. Not all exchanges do this.
|
| Unless you're thinking of some obscure exchange in a tiny
| market, this is just untrue in the U.S., Europe, Canada, and
| APAC. There are no exchanges where market makers get any kind
| of priority to bypass the FIFO queue.
| chroma205 wrote:
| > There are no exchanges where market makers get any kind
| of priority to bypass the FIFO queue.
|
| Nope, several large, active, and liquid markets in the US.
|
| Legally it's not named "bypass the FIFO queue". That would
| be dumb.
|
| In practice, it goes by politically correct names such as
| "designated market maker fill" or "institutional order
| prioritization" or "leveling round".
| Maxatar wrote:
| I can tell you as someone who is a designated market
| maker on several ETFs in the U.S., none of this exists as
| a means of giving market makers priority fills. You're
| taking existing terms and misusing them. For example
| institutional order prioritization is used as a wash
| trade prevention mechanism, not as a way for designated
| market makers to get some kind of fill preference.
| Leveling rounds also do not involve exchanges, this is an
| internal tool used by a broker's OMS to rebalance
| residuals so accounts end up with the intended
| allocation, or cleaning up odd-lot/mixed-lot leftovers.
|
| I am getting the feeling you either are not actually a
| quant, or you were a quant and just misheard and confused
| a lot of things together, but one thing is for sure...
| your claim that market makers get some kind of priority
| fills is factually incorrect.
| ddtaylor wrote:
| Alpaca?
| acrooks wrote:
| A really important part of this is the emotional component.
| When real money is involved, then you will sometimes face
| actual losses. It's hard for a human to completely trust the
| machine in real world trading
| andoando wrote:
| Backtracking is useless because if you try out a million
| strategies, by chance you will find one that works for past
| data.
| copypaper wrote:
| >Each model gets access to market data, news APIs, company
| financials...
|
| The article is very very vague on their methodology (unless I
| missed it somewhere else?). All I read was, "we gave AI access to
| market data and forced it to make trades". How often did these
| models run? Once a day? In a loop continuously? Did it have
| access to indicators (such as RSI)? Could it do arbitrary
| calculations with raw data? Etc...
|
| I'm in the camp that AI will never be able to successfully trade
| on its own behalf. I know a couple of successful traders (and
| many unsuccessful!), and it took them years of learning and
| understanding before breaking even. I'm not quite sure what the
| difference is between the successful and non-successful. Some
| sort of subconscious knowledge from staring at charts all day? A
| level of intuition? Regardless, it's more than just market data
| and news.
|
| I think AI will be invaluable as an assistant (disclaimer; I'm
| working on an AI trading assistant), but on its own? Never. Some
| things simply simply can't be solved with AI and I think this is
| one of them. I'm open to being wrong, but nothing has convinced
| me otherwise.
| XenophileJKO wrote:
| So.. I have been using an LLM to make 30 day buy and hold
| portfolios. And the results are "ok". (Like 8% vs 6% for the S&P
| 500 over the last 90 days)
|
| What you ask the model to do is super important. Just like
| writing or coding.. the default "behavior" is likely to be
| "average".. you need to very careful of what you are asking for.
|
| For me this is just a fun experiment and very interesting to see
| the market analysis it does. I started with o3 and now I'm using
| 5.1 Thinking (set to max).
|
| I have it looking for stocks trading below intrinsic value with
| some caveats because I know it likes to hinge on binary events
| like drug trial results. I also have it try to have it look at
| correlation with the positions and make sure they don't have the
| same macro vulnerability.
|
| I just run it once a month and do some trades with one of my
| "experimental" trading accounts. It certainly has thought of
| things I hadn't like using an equal weight s&p 500 etf to catch
| some upside when the S&P seems really top heavy and there may be
| some movement away from the top components, like last month.
| themafia wrote:
| I look for issues with a recent double bottom and high insider
| buy activity. I've found this to be a highly reliable set of
| signals.
| XenophileJKO wrote:
| That is interesting.
|
| I was trying to not be "very" prescriptive. My initial
| impression was, if you don't tell it to look at intrinsic
| value, the model will look at meme or very common stocks too
| much. Alternatively specifying an investing persona would
| probably also move it out of that default behavior profile.
| You have to kind of tell it about what it cares about. This
| isn't necessarily about trying to maximize a strategy, it was
| more about learning what kinds of things would it focus on,
| what kind of analysis.
| dismalaf wrote:
| Back when I was in university we used statistical techniques
| similar to what LLMs use to predict the stock market. It's not a
| surprise that LLMs would do well over this time period. The
| problem is that when the market turns and bucks trends they don't
| do so well, you need to intervene.
| cedws wrote:
| Backtesting for 8 months is not rigorous enough and also this
| site has no source code or detailed methodology. Not worth the
| click.
| _alternator_ wrote:
| Wait, they didn't give them real money. They simulated the
| results.
| petesergeant wrote:
| If I'm reading this, almost all of Grok's advantage comes from
| heavy bets into semi-conductors spiking: ASML, INTC, MU.
| mikewarot wrote:
| They weren't doing it in real time, thus it's possible that the
| LLMs might have had undisclosed perfect knowledge of the actual
| history of the market. Only an real time study is going to
| eliminate this possibility.
| itake wrote:
| Model output is non-deterministic.
|
| Did they make 10 calls per decision and then choose the majority?
| or did they just recreate the monkey picking stocks strategy?
| ta12653421 wrote:
| ++1
|
| This.
|
| Thats also the reason why i still belive in "classic
| instruments" when configuring my trade app; the model wont give
| you the same entries on lets say 5 questions.
| hoerzu wrote:
| How many trades? What's the z-score?
| hoerzu wrote:
| For backtesting LLMs on polymarket I built. You can try with live
| data without sign up at: https://timba.fun
| luccabz wrote:
| we should:
|
| 1. train with a cutoff date at ~2006
|
| 2. simulate information flow (financial data, news, earnings,
| ...) day by day
|
| 3. measure if any model predicts the 2008 collapse, how confident
| they are in the prediction and how far in advance
| stuffn wrote:
| Trading in a nearly 20 year bull market and doing well is not an
| accomplishment.
| dehrmann wrote:
| Is it just prompting LLMs with "I have $100k to invest. Here are
| all publicly traded stocks and a few stats on them. Which stocks
| should I buy?" And repeat daily, rebalancing as needed?
|
| This isn't the best use case for LLMs without a lot of prompt
| engineering and chaining prompts together, and that's probably
| more insightful than running them LLMs head-to-head.
| client4 wrote:
| The obvious next question is: does the AI on cocaine outperform?
| https://pihk.ai/
| Genego wrote:
| When I see stuff like this, I feel like rereading the Incerto by
| Taleb just to refresh and sharpen my bullshit senses.
| bwfan123 wrote:
| LLM is the fad of the day, and these sort of articles provoke
| the natural get-rich-quick-greed inherent in all of us,
| especially the young tech-types. As such they are clickbait,
| and also a barometer of the silliness that is widespread.
|
| I am curious why re-reading incerto sharpens your bullshit
| sense. I have read a few in that series, but didnt see it as
| sharpening my bullshit sensor.
| dhosek wrote:
| I wouldn't trust any backtracking test with these models. Try
| doing a real-time test over 8 months and see what happens then.
| I'd also be suspicious of anything that doesn't take actual costs
| into account.
| rallies wrote:
| We're running some live experiments these days, for both stocks
| and options. https://rallies.ai/arena
| philipwhiuk wrote:
| With actual money? Or still fake money?
| wowamit wrote:
| Is finding the right stocks to invest in an LLM problem? Language
| models aren't the right fit, I would presume. It would also be
| insightful to compare this with traditional ML models.
| XCSme wrote:
| If it's backtesting on data older than the model, then strategy
| can have lookahead bias, because the model might already know
| what big events will happen that can influence the stock markets.
| lvspiff wrote:
| I setup real life accounts with etrade and fidelity using the
| etrade auto portfolio, fidelity i have an advisor for retirement,
| and then i did a basket portfolio as well but used ms365 with
| grok 5 and various articles and strategies to pick a set of 5
| etfs that would perform similarly to the exposure of my other
| two.
|
| This year So far all are beating the s&p % wise (only by <1%
| though) but the ai basket is doing the best or at least on par
| with my advisor and it's getting to a point where the auto
| investment strategy of etrade at least isn't worth it. Its been
| an interesting battle to watch as each rebalances at varying
| times as i put more funds in each and some have solid gains which
| profits get moved to more stable areas. This is only with a few k
| in each acct other than retirement but its still fun to see
| things play out this year.
|
| In other words though im not surprised at all by the results. Ai
| isnt something to day trade with still but it is helpful in doing
| research for your desired risk exposure long term imo.
| lisbbb wrote:
| How much are the expense ratios on those etfs you chose,
| though? I mean, Vanguard, Fidelity, Blackrock, and others have
| extremely low cost funds and etfs and it has been shown year
| after year and decade after decade that you can't beat their
| average returns over the long term. Indexing works for a
| reason. Beating something by 1%? It's not even worth it if your
| costs and taxes are higher than that.
| IncreasePosts wrote:
| Just picking tech stocks and winning isn't interesting unless we
| know the thesis behind picking the tech sticks.
|
| Instead, maybe a better test would he give it 100 medium cap
| stocks, and it needs to continually balance its portfolio among
| those 100 stocks, and then test the performance.
| refactor_master wrote:
| Should have done GME stocks only. Now THAT would've been
| interesting to see how much they'd end up losing on that.
|
| Just riding a bubble up for 8 months with no consequences is not
| an indicator of anything.
| btbuildem wrote:
| It turns out DeepSeek only made BUY trades (not a single SELL in
| the history in their live example) -- so basically, buy & hold
| strategy wins, again.
| culi wrote:
| this study should be replicated during a bear market
| bmitc wrote:
| Buy and hold performs well over long time scales by simply
| not adjusting based upon sentiment.
| throwawayffffas wrote:
| Operating word is long, historically if you entered the
| market just before a downturn, it could take years up to a
| couple of decades to make up. Depending on which downturn
| we are looking at.
| bmitc wrote:
| I think that requires entering once. I was referring to
| continuing to enter periodically and holding.
| darepublic wrote:
| So in other words I should have listened to the YouTube brainrot
| and asked chatgot for my trades. Sigh.
| theymademe wrote:
| prince of zamunda LLM edition or whatever that movie was based on
| that book was based on the realization how pathetic it all was
| based on was? .... yeah, some did a good one on ya. just imagine
| evaluating that offspring one or two generations later ... ffs,
| _this_ is sooooooooooooooo embarrassing
| 867-5309 wrote:
| tl;dr https://www.aitradearena.com/blog/llm-performance-chart.png
| 867-5309 wrote:
| GPT-5 was released _4_ months ago..
| regnull wrote:
| I'm working on a project where you can run your own experiment
| (or use it for real trading): https://portfoliogenius.ai. Still a
| bit rough, but most of the main functionality works.
| hsuduebc2 wrote:
| In bullish market when few companies are creating a bubble, does
| this benchmark have any informational value? Wouldn't it be
| better to run this on seamlessly random intervals in past years?
| mempko wrote:
| The stats are abysmal. What's the MDD compared to S&P 500. What
| is the Sortino? What are the confidence intervals for all the
| stats? Number of trades? So many questions....
| energy123 wrote:
| One of the recent NeurIPS best paper recipients is relevant here:
| https://openreview.net/forum?id=saDOrrnNTz
|
| > an extensive empirical study across more than 70 models,
| revealing the Artificial Hivemind effect: pronounced intra- and
| inter-model homogenization
|
| So the inter-model variety will be exeptionally low. Users of
| LLMs will intuitively know this already, of course.
| keepamovin wrote:
| I'd say Grok did best because it has the best access to
| information. Grok deep search and real time knowledge
| capabilities due to the X integration and just general being
| plugged into the pulse of the Internet a really best in class.
| It's a great OSINT research tool.
|
| Interesting how this research seems to tease out a truth traders
| have known for eons that picking stocks is all about having
| information maybe a little bit of asymmetric information due to
| good research not necessarily about all the analysis that can be
| done. (that's important but information is king) because it's a
| speculative market that's collectively reacting to those kind of
| signals.
| stockresearcher wrote:
| I appreciate that you've made the trade histories downloadable
| and will be taking a look to see what I can learn.
|
| I've glanced over some of it and really wonder why they seemed to
| focus on a small group of stocks.
| aperture147 wrote:
| Why is bullshit detector ringing as hell right now??? This sounds
| like another billion-dollar-Markov-chain-IP that claimed to
| change the world, opening with a paper with flying colors.
| frobisher wrote:
| lolol Gemini
| rallies wrote:
| This is pretty cool.
|
| We're also running a live experiment on both stocks and options.
| One difference with our experiment is a lot more tools being
| available to the models (anything you can think of, sec filings,
| fundamentals, live pricing, options data).
|
| We think backtests are meaningless given LLMs have mostly
| memorized every single thing that happened so it's not a good
| test. So we're running a forward test. Not enough data for now
| but pretty interesting initial results
|
| https://rallies.ai/arena
| touristtam wrote:
| How is Qwen so much worse than the rest (for the period
| accounted)?
| natiman1000 wrote:
| Is the code/prompts used open source? if not how can we say
| it's ligit
| nurettin wrote:
| Deepseek and grok together would perform even better.
| vpribish wrote:
| this is so stupid i wish i could flag it twice
| Frieren wrote:
| flagged it for you
| aidenn0 wrote:
| It seems to me that short-term simulations will tend to
| underprice risk.
|
| Imagine a market where you can buy only two stocks:
|
| Stock A goes up invariably 1% per month
|
| Stock B goes up 1.5% per month with a 99% chance, but loses 99%
| of its value with a 1% chance.
|
| Stock B has a 94% chance of beating stock A on a 6 month
| simulation, but only a 30% chance of beating stock A on a 10 year
| simulation.
| fortran77 wrote:
| I would love to see this run during an extended bear market
| period.
| ta12653421 wrote:
| Cant the model go short in a bear market?
| toephu2 wrote:
| Predicting stock prices means you are competing directly against
| massive hedge funds and professional quant teams with effectively
| unlimited budgets and large teams of engineers. These
| professionals are already using and constantly tweaking the
| latest models to gain an advantage.
|
| It is highly unlikely that you guys or any individual, even
| utilizing the latest LLMs will consistently discover an edge that
| beats the market over the long run.
| pech0rin wrote:
| 8 months of a huge bull market. Not exactly indicative of any
| real insight.
| rcarmo wrote:
| I spent a while looking at trading algos a few years back (partly
| because of quant stuff I got involved in, and partly out of
| curiosity). I found that none of the "slow" trading (i.e., that
| you could run at home alongside your day trading account) was
| substantially effective (at least in my sampling), but I never
| thought an LLM would be any good at it because all the analysis
| is quantitative, not qualitative or contextual.
|
| In short, I don't think this study proves anything unless they
| gave the LLMs additional context besides the pure trading data
| (Bloomberg terminals have news for a reason--there's typically a
| lot more context in he market than individual stock values or
| history).
| morgengold wrote:
| Am I right that you let LLMs decide for themselves what to read
| into their input data (like market data, news APIs, company
| financials)? While this is worth testing, I think it would be
| more interesting to give them patterns to look for. I played
| around with using them for technical analysis and let them make
| the associations with past stock performances. They can even
| differentiate on what worked in the last 5 years, what in the
| last year, in the last 3 month etc. This way they can pick up
| (hopefully) changes in market behavior. Generally the main
| strength of this approach is to use their pattern recognition
| capability and also take out the human factor (emotions) for
| trading decitions.
| Bombthecat wrote:
| I wouldn't call this a test, I would create a test portfolio of
| hundred semi random stocks and see what they sell buy or keep.
|
| That tells me way more then "YOLO tech stocks"
| bitmasher9 wrote:
| 1. Backtesting doesn't mean very much. For lots of reasons real
| trading is different than backtesting.
|
| 2. 8 months is an incredibly short trading window. I care where
| the market will be in 8 years way more then 8 months.
| ryandvm wrote:
| It seems like back-testing an LLM is going to require
| significant white-washing of the test data to prevent the LLM
| from just trading on historical trends it is aware of.
|
| Scrubbing symbol names wouldn't even be enough because I
| suspect some of these LLMs could "figure out" which stock is,
| say NVDA, based on the topology of its performance graph.
| amelius wrote:
| Nonsense. Title should read $0 because they didn't use actual
| money.
|
| Also, it seems pretty stupid to use commodity tech like LLMs for
| this.
| FrustratedMonky wrote:
| How much of this is just because the market as a whole is going
| up.
|
| This same kind of mentality happened pre-2008. People thought
| they were great at being day-traders, and had all kinds of
| algorithms that were 'beating the market'.
|
| But it was just that the entire market was going up. They weren't
| doing anything special.
|
| Once the market turned downward, that was when it took talent to
| stay even. Show me these things beating a
| downward market.
| throwawayffffas wrote:
| > We also built a way to simulate what an agent would have seen
| at any point in the past. Each model gets access to market data,
| news APIs, company financials--but all time filtered: agents see
| only what would have been available on that specific day during
| the test period.
|
| That's not going to work, these agents especially the larger
| ones, will have news about the companies embedded in their
| weights.
| devilsbabe wrote:
| Funny how if you kept reading before commenting, they addressed
| that point specifically
|
| > We were cautious to only run after each model's training
| cutoff dates for the LLM models. That way we could be sure
| models couldn't have memorized market outcomes.
| thedougd wrote:
| Would be nice to use the logos in the legend. I use these LLMs
| everyday and didn't know what half these logos on the graph were.
| krauses wrote:
| I'd like to see a variation of the models being fine tuned based
| on investments of those in congress that seem to consistently
| outperform the markets.
| RandomLensman wrote:
| Could be interesting to see performance distribution for random
| strategies on that stock universe as a comparison. The reverse
| could also be interesting: how do the models perform on data that
| is random?
| mvkel wrote:
| When the market is rising, everyone looks like a genius.
|
| Would have been better to have variants of each, locked to
| specific industries.
|
| It also sounds like they were -forced- to make trades every day.
| Why? deciding not to trade is a good strategy too.
| cramcgrab wrote:
| Yeah I've been using grok to manage my yolo fund, it's been doing
| great so far, up around 178% ytd, only rebalance once every other
| month.
| portly wrote:
| What is the point of this?
|
| LLMs are trained to predict the next word in a text. In what way,
| shape or form does that have anything to do with stock market
| prediction? Completely ridiculous AI bubble nonsense.
| another_twist wrote:
| No it isnt. Next word prediction is what humans do to
| communicate anyway so the criticism isnt valid. Except you do
| that for your own sentences (if you do it for others its
| considered rude :) ).
|
| Anyways this criticism is now dated given that modern day LLMs
| can solve unseen reasoning problems such as those found in the
| IMO.
|
| It does have something to do with the stock market, since its
| about making hypotheses and trading based off that. However,
| I'd agree that making a proper trading AI here would require
| reasoning based fine tuning for stock market trading actions.
| Sort of like running GRPO taking market feedback as the reward.
| the article simply cant do that due to not having access to the
| underlying model weight.
| bwfan123 wrote:
| shhh. We need more of these as counter-parties to improve
| alpha.
| mvkel wrote:
| Predicting the stock market will likely never happen because it's
| recursive. We can predict the next 10 days of weather, but the
| weather doesn't change because it read your forecast. As long as
| markets continue to react to their own reactions, they will
| remain unpredictable.
|
| If the strategy is long, there might be alpha to be found. But
| day trading? No way.
| oersted wrote:
| If stocks are more of a closed system that are weakly affected
| by external factors in the short term, now I finally understand
| why they hire so many physicists for financial modeling!
|
| There is of course the fact that physicists tend to be the best
| applied mathematicians, even if they don't end up using any of
| their physics knowledge. And they generally had the reputation
| of "the smartest" people for the last century.
|
| Anyway, such systems are complex and chaotic yes, but there are
| many ways of predicting aspects of them, like with fluid
| simulation to give a basic example. And I don't get your point
| about weather, it is also recursive in the same way and
| reacting to its own reactions. Sure it is not reacting to
| predictions of itself, but that's just a special kind of
| reaction, and patterns in others predictions can definitely be
| predicted accurately, perhaps not individually but in the
| aggregate.
| mvkel wrote:
| > there are many ways of predicting aspects of them
|
| Yes, and it's priced in
|
| > but that's just a special kind of reaction
|
| That's just arguing semantics. My point was that weather
| doesn't react to human predictions, explicitly
| jerf wrote:
| "We can predict the next 10 days of weather, but the weather
| doesn't change because it read your forecast."
|
| Less true than it used to be, with cloud seeding being an off-
| the-shelf technology now. Still largely true, but not entirely
| true anymore.
| machiaweliczny wrote:
| > Potential accidental data leakage from the "future"
|
| Exactly. Makes no sense with models like grok. DeepSeek also
| likely has this leak as was trained later.
| Glyptodon wrote:
| Multiple runs of randomized backtesting seem needed for this to
| mean anything. It's also not clear to me how there's any kind of
| information update loop. Maybe I didn't read closely enough.
| kqr wrote:
| Extremely similar earlier submission but focused on
| cryptocurrencies, using real money, and in real time:
| https://news.ycombinator.com/item?id=45976832
|
| I'm extremely skeptical of any attempt to prevent leakage of
| future results to LLMs evaluated on backtesting. Both because
| this has beet shown in the literature to be difficult, and
| because I personally found it very difficult when working with
| LLMs for forecasting.
| kqr wrote:
| Their annual geometric mean return is 45 %! That's some serious
| overbetting. In a market that didn't accidentally align with
| their biases, they would have lost money very quickly.
| reactordev wrote:
| I would love for them to have included a peg position on SPY @
| 100k over the course of the same period. Gives a much better
| benchmark of what an LLM can do (not much above 2-4%).
|
| Still, cool to see others in my niche hobby of finding the money
| printer.
| peterbonney wrote:
| The devil is really in the details on how the orders were
| executed in the backtest, slippage, etc. Instead of comparing to
| the S&P 500 I'd love to see it benchmarked against a range of
| active strategies, including common non-AI approaches (e.g. mean
| reversion, momentum, basic value focus, basic growth focus, etc.)
| and some simple predictive (non-generative) AI models. This would
| help shake out whether there is selection alpha coming out of the
| models, or whether there is execution alpha coming out of the
| backtest.
| rao-v wrote:
| I'd rather give an LLM the earnings report for a stock and the
| next day's SNP 500 opening and see if it can predict the opening
| price.
|
| Expecting an LLM to magically beat efficient market theory is a
| bit silly.
|
| Much more reasonable to see if it can incorporate information as
| well as the market does (to start)
| natiman1000 wrote:
| If the code and prompts are not open source how can we trust
| anything yall say?
| elzbardico wrote:
| A rising tide lift all boats.
| dudeinhawaii wrote:
| This is the complete wrong way to do this. I say this as someone
| who does work in this area of leveraging LLMs to a limited degree
| in trading.
|
| LLMs are naive, easily convinced, and myopic. They're also non-
| deterministic. We have no way of knowing if you ran this little
| experiment 10 times whether they'd all pick something else. This
| is a scattershot + luck.
|
| The RIGHT way to do this is to first solve the underlying problem
| deterministically. That is, you first write your trading
| algorithm that's been thoroughly tested. THEN you can surface
| metadata to LLMs and say things along the lines of "given this
| data + data you pull from the web", make your trade decision for
| this time period and provide justification.
|
| Honestly, adding LLMs directly to any trading pipeline just adds
| non-useful non-deterministic behavior.
|
| The main value is speed of wiring up something like sentiment
| analysis as a value add or algorithmic supplement. Even this
| should be done using proper ML but I see the most value in using
| LLMs to shortcut ML things that would require time/money/compute.
| Trading value now for value later (the ML algorithm would
| ultimately run cheaper long-run but take longer to get into
| prod).
|
| This experiment, like most "I used AI to trade" blogs are
| completely naive in their approach. They're taking the lowest
| possible hanging fruit. Worst still when those results are the
| rising tide lifting all boats.
|
| _Edit_ (was a bit harsh) This experiment is an example of the
| kind of embarrassingly obvious things people try with LLMs
| without understanding the domain and writing it up. To an
| outsider it can sound exciting. To an insider it 's like seeing a
| new story "LLMs are designing new CPUs!". No they're not. A more
| useful bit of research would be to control for the various
| variables (sector exposure etc) and then run it 10_000 times and
| report back on how LLM A skews towards always buying tech and LLM
| B skews towards always recommending safe stocks.
|
| Alternatively, if they showed the LLM taking a step back and
| saying "ah, let me design this quant algo to select the best
| stocks" -- and then succeeding -- I'd be impressed. I'd also know
| that it was learned from every quant that had AI double check
| their calculations/models/python.. but that's a different point.
| snapdeficit wrote:
| Anyone who traded tech stocks in the 1990s when AmeriTrade
| appeared remembers this story.
|
| Have the LLMS trade anything BUT tech stocks and see how they do.
|
| That's the real test.
|
| EDIT: I remember this is probably before AmeriTrade offered
| options. I was calling in trades at 6:30AM PST to my broker while
| he probably laughed at me. But the point is the same: any doofus
| could make money buying tech stocks and holding for a few weeks.
| Companies were splitting constantly.
___________________________________________________________________
(page generated 2025-12-05 23:01 UTC)