[HN Gopher] The anatomy of an ML-powered stock picking engine
___________________________________________________________________
The anatomy of an ML-powered stock picking engine
Author : muggermuch
Score : 140 points
Date : 2022-09-27 17:01 UTC (5 hours ago)
(HTM) web link (principiamundi.com)
(TXT) w3m dump (principiamundi.com)
| igorkraw wrote:
| Nice writeup, thank you for sharing so openly!
|
| The three things I always want to know from stock picking ML
| people:
|
| 1. Did you put your own money in it ?
|
| 2. How'd it go?
|
| 3. How well does your engine do vs a fixed stock allocation based
| on trend-statistics computed on the whole time window (i.e.,
| compared to a fixed optimal portfolio computed with mean/std
| values you don't have access to, but which isn't allowed to
| change its choice. what's the regret if you are familiar with
| online learning)
| muggermuch wrote:
| Thank you for appreciating the article; I tried to disclose all
| that I could!
|
| 1. Yes, I did put my own money in it (low 6 figures).
|
| 2. It went as described in the article - for the capital I
| allocated to Didact, I beat the market (SPY) by ~20% since
| inception.
|
| 3. If I understand your question correctly, this would be the
| equivalent of the payoff on an optimal lookback option
| (https://en.wikipedia.org/wiki/Lookback_option). I haven't
| actually done that analysis, but it sounds like a nice idea.
| adamsmith143 wrote:
| >2. It went as described in the article - for the capital I
| allocated to Didact, I beat the market (SPY) by ~20% since
| inception.
|
| This seems extremely hard to believe. You should be running a
| multi-billion $ Quant fund if this is the case. The idea that
| you would try to push this as a newsletter rather than just
| taking investor money and becoming a billionaire literally
| makes the story seem farcical.
| HFguy wrote:
| It is very easy to believe.
|
| I could have flipped a coin, gone long or short at
| beginning of this year.
|
| I would have had a 50% chance of outperforming the market
| by 40% this year (given it is down roughly 20%).
| [deleted]
| muggermuch wrote:
| >You should be running a multi-billion $ Quant fund if this
| is the case.
|
| You seem to underestimate the level of effort and rigor
| required to achieve this level of capital allocation. In
| contrast, beating the market by 20% is table stakes. Folks
| in the industry do it all the time; the difference here
| simply is that I built an ML-powered engine to do it
| systematically.
| colinmhayes wrote:
| Starting a hedge fund is a lot harder than beating the
| market by 20%.
| gbasin wrote:
| If your predictions are good, I'd be happy to get you $100
| million in assets to manage. It's very unlikely that your
| predictions are good...
| notacop31337 wrote:
| It's very unlikely that you're able to get OP $100 million in
| assets...
| mbarras_ing wrote:
| Brilliantly written. As someone considering a move into the Quant
| field it is very informative.
| muggermuch wrote:
| Thank you!
| ajoseps wrote:
| this is very cool! where did you get your data from and how's the
| transition to airflow?
| muggermuch wrote:
| There are commercial feeds available via Nasdaq DataLink (FKA
| Quandl). I also bought bulk historical data to feed through my
| backtester (I haven't talked about this in the post; it was
| getting to be a bit too long).
| timeserious wrote:
| Let's get a write up of your backtesting framework too
| please! Terrific post @muggermuch - thank you!
| Joel_Mckay wrote:
| Every gambler thinks they have a system, but often fails to
| recognize a game is unfair long before they arrived. lol =)
| darepublic wrote:
| You can think outside the box to beat the unfair game but then
| you end up in jail.
| jesuslop wrote:
| Nice report. How did you did risk management? Have you been
| leveraged? Have you paid for data? Kudos for a view from the
| trenches.
| muggermuch wrote:
| Thank you!
|
| >How did you did risk management? I put in a basic position
| management layer (1% fixed stop). Also, the market regime
| module would modulate participation, i.e. in really risky
| environments it would dial down the number of stock picks. I
| can definitely do much more on this front, but I wanted to nail
| down the stock picking first! :)
|
| >Have you been leveraged? No leverage.
|
| >Have you paid for data? Yes, my monthly running costs for data
| are ~$1.2k.
| pneumatic1 wrote:
| Have you looked into Kelly criterion?
| muggermuch wrote:
| Yes! I use fractional Kelly extensively in my (separate)
| higher-frequency strategies (on MES/ES/NQ/VX futures).
|
| I'm thinking of writing some follow-up posts on how to
| reason about ML-driven strategies in an intraday setting.
| Thanks to low-cost brokerages, there's a lot of alpha that
| can be captured by small league speculators such as myself.
| muggermuch wrote:
| Hi, fellow HN'ers! Author here, please let me know if you have
| any questions or thoughts!
| krschultz wrote:
| I'm not at all interested in finance / stock picking but found
| this to be one of the best walkthroughs of an ML system end-to-
| end that I've ever read. I'm not in the field of ML but I'm
| interested in learning more and this was fantastic, thank you.
| muggermuch wrote:
| Thank you so much for your kind words! Your comment made my
| day! :)
| dennisy wrote:
| This is great! Thanks for writing this!
|
| I have wanted to do something like this for a while, purely for
| learning. The thing which puts me off is that there is a huge
| amount of knowledge needed in understanding the features vs the
| ML.
|
| Could you recommend a base system / reference one could use to
| get started which explains or bakes in some of the feature /
| signals engineering work?
|
| Also would this approach work with crypto?
| muggermuch wrote:
| > Also would this approach work with crypto?
|
| Some of it works on crypto. TBH I've stayed away from the
| asset class, but only because I find it difficult to build
| mental models and think about features (in my mind, it's a
| mix of commodity factors and currency factors, but I'd have
| to test it out).
|
| I seem to remember coming across papers that have tested
| momentum factors at larger time-frames (e.g. weeklies).
|
| > Could you recommend a base system / reference one could use
| to get started which explains or bakes in some of the feature
| / signals engineering work?
|
| The references I put in at the end of the post will really
| help with this! I might actually write out a separate blog
| post about starting out in this space from an ML perspective.
| Thanks for the idea!
| idoh wrote:
| If you have a tool that can generate great returns, then why fall
| back to a newsletter?
| muggermuch wrote:
| Great question.
|
| If I beat the market by 20% (say SPY generated 0% for the year,
| very optimistic at this point), and I have allocated $100k to
| this, I make $20k before taxes.
|
| That's less than minimum wage.
|
| Meanwhile, allocators expect a track record of at least 3-5
| years.
|
| Ideally, if I have an asset, I'd like to extract as much
| revenue as I can.
|
| Hope this makes sense.
| xapata wrote:
| If you're sitting on a gold mine, you can wait 5 years. This
| does not make sense.
| [deleted]
| M3L0NM4N wrote:
| You also don't know if your alpha is going to last 5 years.
| The gold mine can run out of gold.
| beambot wrote:
| OP could parlay this experience into a high-paying finance
| job. Algorithmic edge tends to be short lived.
| muggermuch wrote:
| Indeed. I haven't shut down development, just shut down the
| newsletter. I'm continuing to work on it.
| shadycuz wrote:
| I'm a self proclaimed world class DevOps engineer. Can I
| help contribute in order to get access to the model?
| muggermuch wrote:
| :) hmu on LinkedIn!
| YetAnotherNick wrote:
| Learn about hedging. Basically, for $100k, if your prediction
| could consistently beat some index, you don't just buy a
| stock, but you sell some other(short) stock/index at the same
| time. So you own 0 worth of stock but you get the difference
| in the increase as your profit. Obviously in real world, you
| would need some sort of deposit, but you could bet millions
| for $100k.
| seanhunter wrote:
| You're talking about both hedging and leverage and this is
| a very important difference.
|
| Turning a long-only equity strategy into a long/short
| strategy or an "outperformance" strategy[1] with added
| leverage can seriously affect the volatility of returns and
| the risk of ruin so it's really important to understand
| well before embarking on this, because it will affect
| position sizing and a bunch of other things. You can indeed
| bet millions for $100k, but if your strategy has 10%
| volatility unlevered you can get completely wiped out in
| doing so whereas the risk of ruin of the unleveraged
| strategy is far lower.
|
| [1] You could say long/short is where you long some things
| and short some other things generally whereas
| outperformance is where you long some things and
| specifically short an index. So in the latter case you are
| betting on the outperformance of your picks in particular
| and in the former you are just saying you have the ability
| to pick both things that go up and things that go down.
| Straw wrote:
| How difficult is it to get investors when you can show your
| model beats the market consistently?
|
| Of course, they have to check your not trading a strategy
| with extreme tail risk, but here it sounds like that's not
| the case?
| muggermuch wrote:
| It's difficult. We made a lot of pitches.
| Investors/allocators require a fairly long track record and
| are extremely reluctant to fund (what they perceive to be)
| black box strategies.
| [deleted]
| sanp wrote:
| OP, what are you using to draw the diagrams? They look nice and
| are very readable.
| muggermuch wrote:
| Thank you!
|
| I used Excalidraw (https://excalidraw.com), and I highly
| recommend it! It gives me 'xkcd' vibes.
| alpineidyll3 wrote:
| My heart goes out to this author, but you can tell even by his
| first table that he doesn't quite understand the mathematics of
| financial markets, the purpose of a hedge fund, how they grow
| etc.
|
| 1) It's plain by quickly looking at the allocation of capital in
| investment firms, that AUM is not made by performance; it's
| marketing. At best people invest when they believe a person is
| connected to inside information. Saying you have an ML advisor is
| really just a pre-req to these people.
|
| 2) Is that allocation stupid? No, it's not, because actually the
| powers of mathematics and by extension ML are intrinsically
| limited for investment returns because they are fat-tailed
| </Taleb>. For example this author quotes a realistic sharpe
| (0.8), but didn't calculate the standard deviation in his sharpe,
| which I would bet a large sum was _at least_ 0.8. Ie: he doesn't
| really know what his sharpe is. This is because equity assets
| behave like a student-t distributions with a degree-of-freedom
| parameter ~2 or less </Mandlebrot, /Bergomi, /Gatheral etc.>. Ie:
| higher moments such as uncertainty in sharpe, literally do not
| exist or converge and are unknowable. The only exception is if
| your strategy explicitly cuts off tails.
|
| Once you understand 2) you begin to understand that there's no
| such thing as a real quant fund (ie a fund which truly makes
| money predictably using models) which doesn't trade a liquidity
| limited book that has quite advanced hedging. Wealthy people are
| aware of this, which is why the author can't market this product.
|
| If you're doing something silly like holding equities without
| tail risk control, you literally cannot be quantitatively
| investing. You are just slowly rediscovering what Kelly, Bergomi,
| Mandlebrot, Bernay's etc. realized with a little deep thought
| over pen and paper (while clumsily writing boilerplate software.)
| That markets are entropy machines rougher than a normal
| distribution, and any gains come directly from information. (see:
| Kelly: "a novel interpretation of the information rate".)
|
| For a high latency (ms) market data feed, the returns on
| information are very very small. Markets are efficient.
| chollida1 wrote:
| Someone asked about how difficult it is to get outside
| investment....
|
| It's usually very difficult and it takes a lot of money to run a
| proper fund.
|
| Let's say you raise $50M. You can maybe charge 1 and 20,meaning
| you get 1% of assets each year for running the fund and 20% of
| profits.
|
| 1% of $50M( and keep in mind this is a large raise for someone
| without a track record on the sell side or inside another fund)
| give you $500,000 a year to pay:
|
| - salaries( lets say you pay yourself $100,000 all in plus the
| same for a single analyst
|
| - a Bloomberg terminal $30,000 including data feeds
|
| - market data feeds you need $25,000/year for basic market data
| and fundamental data that you are allowed to warehouse(you can't
| store data you get from the Bloomberg terminal).
|
| - rent $50,000/year for office space
|
| - outside lawyer fees and outside accounting fees $100,000/year
|
| - similar fees for someone to run your back office, roughly
| $100,000/year.
|
| And on the other side of expenses you have the money making side
| of things. Which as the OP pointed isn't great. If you return 10%
| on the 50M you get to keep 20% of that so a 10% return gives $5M
| in profits and you keep $1M.
|
| That allows you to bonus out yourself and analysts on good years.
| If you lose money one year then you get no bonus and have to
| bonus out the employees out of the retained earnings you kept
| from previous bonuses.
|
| it usually gets worse as most funds have what's called a high
| water mark. This means you don't collect the performance fee
| until your fund gets back to the high water mark. So if you are
| down 10% one year you need to make that back before you start to
| make any performance fee, which is why most funds shut down if
| they go down more than 20%.
|
| As to raising money.....Anyone can show a model that makes money.
| that doesn't mean its easy to create a model, its just that there
| are alot of people capable of building such a model.
|
| Its the risk management that people with money are really looking
| for and sadly that's just really hard to show out of a model as
| part of the risk management is things like positions sizing and
| showing your model doesn't pile into one asset class or trade
| correlated products.
|
| it bodes well for the OP that they talk about market regimes as,
| IMHO, this is one of the biggest risk management tools that
| aspiring traders ignore.
|
| And this risk management is why people ask for a track record of
| more than a year.
| muggermuch wrote:
| Thank you for this comprehensive response!
|
| I have often found myself struggling to explain the difference
| between building a strategy or trading system (which reduces to
| a technical/intellectual challenge) and running a hedge fund
| (essentially running a complex information-driven business).
|
| Your cost breakdown really puts matters into perspective.
|
| > it bodes well for the OP that they talk about market regimes
|
| I concur. Market regimes (modeling, detecting, reasoning about
| them) are too delicious of an intellectual puzzle to resist.
| HFguy wrote:
| This is actually way too optimistic.
|
| Your first 1-2 seed investors will:
|
| - Only pay 1 and 10 (1% fixed fee and 10% of PNL)
|
| - They will also get ownership of the actual fund management
| firm and will get that in the form of 20% of REVENUE (not
| equity, revenue, think about that)
|
| This is one reason new fund formation is way down. The
| economics are bad for years. Know a bunch of HF people that
| started vc-backed tech firms instead.
|
| The other reason is 10+ year run where stocks, bonds, private
| firms and real estate just went up. No need for diversifying
| return streams.
| HFguy wrote:
| BTW, data costs also too low.
|
| Just a BB terminal around 30k and a lot of extra data from BB
| costs extra (can be 200-300k per additional product).
|
| For quant strategy probably looking at 500k up to 2M for data
| initially. And you will likely be at a disadvantage to
| existing firms that have been collecting data for years.
|
| And that is at the low end. Spent many millions per year for
| 1 strategy at last large firm. And that was small fraction of
| total firm spend.
| rmah wrote:
| Working in the industry, I can confirm that the above numbers
| are approximately correct except for the employee costs --
| those are roughly double and up. You also need to hire a fund
| administrator, auditors and compliance firms (maybe $50k to
| $100k per year each) which add on even more costs. And you
| can't skip the lawyers, outside administrator, outside
| compliance, etc. as they are required by regulations/law.
| prabdude wrote:
| Excellent article
| muggermuch wrote:
| Thanks!
| unpwn wrote:
| Lmao this engine is down 6.9% for the year, when literally it's
| as simple as just buying some puts.
| Jabbles wrote:
| You realise that puts have a cost that is determined by the
| market?
| ramesh31 wrote:
| > this engine is down 6.9% for the year
|
| That's pretty damn good, and still beating the market by nearly
| 20%. Of course you can always make more with riskier
| strategies.
| [deleted]
| antognini wrote:
| Have you considered submitting your predictions to the Numerai
| Signals? It's market neutral so as long as your models can
| generate some alpha you can still get good returns.
| muggermuch wrote:
| That's a good idea. I'll try it out, thanks!
___________________________________________________________________
(page generated 2022-09-27 23:00 UTC)