[HN Gopher] The $440M software error at Knight Capital (2019)
___________________________________________________________________
The $440M software error at Knight Capital (2019)
Author : bfm
Score : 119 points
Date : 2022-05-02 18:36 UTC (4 hours ago)
(HTM) web link (www.henricodolfing.com)
(TXT) w3m dump (www.henricodolfing.com)
| inter_netuser wrote:
| Peanuts, just a regular day in DeFi.
|
| https://rekt.news
| Terry_Roll wrote:
| I read this
|
| >Under stock exchange rules, Knight would have been required to
| pay for those shares three days later. However, there was no way
| it could pay, since the trades were unintentional and had no
| source of funds behind them. The only alternatives were to try to
| have the trades canceled, or to sell the newly acquired shares
| the same day.
|
| And then I understand why /r/WallStreeBets and /r/Antiwork is
| gaining traction.
|
| All it takes is a bit of organisation and the adoption of Govt
| tactics and practices which is ultimately violence and then just
| maybe you might see a Govt that works for the people and not the
| criminals, but I cant picture Bernie Sanders wielding a
| pitchfork!
|
| Still I see Musk was market making with his tweet. I dont think
| you can be any more blatant! LOL
| https://twitter.com/elonmusk/status/1520650036865949696?cxt=...
| NovemberWhiskey wrote:
| (2019)
| bfm wrote:
| Updated the title
| randomhodler84 wrote:
| Back in the day $440M loss due to coding error was a landmark
| warning case. How could this happen??
|
| In 2021 alone something like $10B was lost due to bugs in defi
| land.
|
| Something about the worst possible thing could happen tends to
| happen eventually and it gets worse every passing year.
| pingeroo wrote:
| Was just about to comment along these lines. If I read about
| this a few years ago I would be shocked. Now after seeing so
| many flubs in the crypto space, my reaction is just 'meh'
| vmception wrote:
| I actually always think of the Knight case and similar ones
| when people see a DeFi organization have an issue and
| extrapolate that to an issue with the entire DeFi concept.
|
| Its so obvious that those people have no clue whats going on in
| the markets they respect. Truth be told, many of them dont like
| markets at all. So its just a lack of exposure and compounded
| ignorance.
| colechristensen wrote:
| Many traditional finance issues are fixable though, there are
| many more errors which don't become big stories because they
| are reasonably reversed as only minor inconveniences.
| vmception wrote:
| like how Credit Suisse is going to reverse their Bill Hwang
| losses? I guess in this conversation we can't distinguish
| from irreversible asset value and liquidity issue to
| misdirected transactions that inherit partial
| reversibility.
|
| similarly, maybe you/they just don't see the headlines of
| thwarted attacks in DeFi that work specifically due to
| design considerations.
|
| I'll take the permission to fail. The rapid iteration
| creates some really fascinating systems in very short time
| periods, for me. One project implodes, 100 (or 1000) more
| harden, bigger money comes in creating more assurances for
| users like easier recovery and compensation paths, all
| while continuing to rapidly iterate.
| nikanj wrote:
| Those $440M were lost by rich people who had invested in a
| hedge fund, not poor people who bought crypto lottery tickets
| in the hopes of getting rich quick
| bfm wrote:
| The OP details how poor software engineering practices brought
| down a 1.4B market marker with 1400 employees in 2012.
|
| Some of the issues mentioned include: - Keeping
| synthetic test data generation as part of a production build.
| - Keeping dead code for years. - Re-purposing a feature
| flag. - Refactoring without regression tests. -
| Manual deployments without peer reviews. They forgot to update
| one of their servers with the new code. - Automated alerts
| sent via email were ignored. - Rolled back to a version of
| the code running on the server they forgot to update, making
| things worse. - Rushing out a release without proper
| software engineering hygiene.
|
| The article suggests improvements that could have prevented the
| chain of events.
|
| For those here who are in HFT circles, have things improved after
| the Knight Capital Group debacle?
|
| edit: formatting
| rebelos wrote:
| Some of this is unforgivable, but reflecting on it I also
| realized that software engineering at quant firms has an almost
| impossible mandate. You want something akin to the extreme
| rigor of mission critical software (airplanes, cars, NASA,
| etc), while also remaining nimble enough to modify strategies
| as market conditions rapidly evolve.
| SilasX wrote:
| Same is true for blockchain smartcontracts, which have
| similar catastrophic consequences.
| ChrisClark wrote:
| That truly is scary to me. I can easily* write advanced
| Solidity and could try to make something big. But I won't,
| because I know I would not be able to handle the stress and
| responsibility. One tiny logic error and millions lost.
| Thanks but no thanks.
|
| *The fact I believe I could easily do it is probably
| exactly why I'd end up making some huge mistake. ;)
| posterboy wrote:
| That's a weird statement.
|
| The extreme rigor on the one hand seems to require a value
| judgement of the real benefits to HTF that I'm not willing to
| make. The remaining nimble'ity, on the other hand, is an odd
| word to use over _agility_ or old fashioned _responsibility_.
| The benefit is proportional to it, but not exclusively.
|
| The rapidly evolving market conditions concern regular trade
| too. Swift reactions are expected in any other systems
| application. "almost impossible" is a weasel word. It's
| almost impossible to win except for the last man standing, is
| that it? And there's no practical upper limit to nimble'y,
| though conservative estimates indicate that less work is
| more.
|
| What's missing is the perverse incentives, corrupt policies,
| sociopathic leadership, ...
| nradov wrote:
| Why unforgivable? It's only numbers in an account. No one
| died.
| bfm wrote:
| It is challenging, although, with financial markets, it seems
| like it would be simpler to have some automatic anomaly
| detection mechanism to unplug or slow things down to prevent
| further damage.
| WJW wrote:
| There are a lot of preventative measures they could have
| taken, starting with just not leaving in dead code and
| paying attention to automated alerting. But the moral of
| the story is that they got away with it for so long that
| nobody cared about it anymore. After all, if it were truly
| a big deal why hadn't it broken years earlier. Then when
| the technical debt finally got called it bankrupted the
| entire firm in one go.
|
| Most of us (hopefully) have less devastating technical debt
| to deal with, but it is still a cautionary tale about what
| could happen if you ignore it for too long.
| pclmulqdq wrote:
| I used to work in HFT. I have seen highly variable practices in
| this case, including a "mini-knight" incident in the single-
| digit millions due to tech debt and poor test coverage.
| However, the most useful change that has resulted from the KCG
| debacle was adding several layers of kill switches, a dedicated
| ops team to watch trading and flip the kill switches, and
| embracing devops automation.
|
| There is a much more serious focus on having a defense in
| depth, and making sure that problems like this are noticed
| before they become an issue. Rollbacks are no longer the first
| action when something goes wrong: the kill switch comes first.
|
| Dead code, tech debt, repurposed flags, and spotty test
| coverage are everywhere still.
| aaronharnly wrote:
| I'm curious about the "repurposed flags" part.
|
| I wouldn't think of flags as expensive / effortful to make
| more of, but clearly they must be if people are tempted to
| reuse them. Can you help me understand what is meant by a
| flag in this context, and why it would be repurposed?
| isogon wrote:
| Repurposing flags not always well-motivated, but one
| legitimate reason to do this is the memory (and
| particularly cache) footprint.
|
| Often flags are local to a particular object. If there are
| lots of such objects, you want each to take as little space
| as possible. You should check out the contortions linux
| devs go through to make struct page small [0]. This is
| important, because there is one such struct per page of
| physical memory. The memory use is a near-constant
| percentage of your total memory, and you wouldn't want it
| to be any larger than necessary.
|
| Even when there are not a lot of these objects, in low-
| latency software it's important to hit the cache. Your
| program should always just be as compact in memory as
| possible.
|
| Semantically flags are booleans (is proposition P true of
| this object). They are stored compactly as bitsets, often
| implicitly, say: #define FLAG_1 0x01
| #define FLAG_2 0x02 /* ... */ #define
| FLAG_8 0x80 struct order { u32 qty;
| u16 id; u8 type; u8 flags;
| };
|
| This struct will fit into 8 bytes. This is great, as you
| probably won't waste space to alignment in many cases -- 8
| is a good multiple. But if you wanted to add FLAG_9 here,
| your flags would become a u16, and your struct would,
| frustratingly, stop fitting into 8 bytes. To avoid this,
| one might repurpose flags.
|
| Another example of this is intrustive flagging, using, for
| example, the high or low bits of a pointer aligned to 2^n
| bytes. If you run out of bits there, not much you can do.
|
| [0] https://github.com/torvalds/linux/blob/master/include/l
| inux/...
| pclmulqdq wrote:
| This is pretty much why flags get repurposed. It's also
| important to mention that things like JSON and protobufs
| are too expensive for HFT, so you are likely going to be
| sending structs over the wire. Repurposing flags lets you
| change a wire format with a lot less friction than adding
| a byte to a struct. Essentially, it lets you change the
| minor version number on a protocol and only recompile the
| endpoints without changing the major version number and
| recompiling everything.
| commandlinefan wrote:
| > poor test coverage
|
| Yet you don't have to hang around here long to be told that
| "Unit Testing is Overrated": https://tyrrrz.me/blog/unit-
| testing-is-overrated
| kevstev wrote:
| I worked in algo trading for years, eventually got out because
| quite frankly the level of risk I was carrying on my shoulders
| everyday for what I was being paid were just way out of whack,
| I at least personally never got the huge pay days that people
| talked about until after I left finance for more pure tech.
| Interestingly, I worked at Knight and my team pioneered trying
| to blow up the firm, but that was in 2004, and things were much
| friendlier- instead of front page news, it was a small blurb on
| page 3 of the markets section of the WSJ.
|
| Anyway, I still have friends in that business. It hasn't really
| changed, they have too few people covering systems that are
| quite complex and while there are checks and such, no one
| really understands things entirely from end to end in detail
| that can prevent all problems.
|
| I will never invest directly in an investment bank- either
| through carelessness or maliciousness I could have easily
| caused a 9 figure loss, if not more, and there were probably a
| thousand other people in the same position.
|
| When I read the detailed writeup around this a few years back,
| I think by far the biggest issue was reusing a tag that had
| been previously used to denote which strategy to use. I
| understand why they may have chosen to do so, at the Big Bank I
| was working at, getting a new fix tag to be passed through all
| the layers properly would involve at least two other teams and
| coordinating releases and probably several weeks worth of
| meetings. If you just reuse an old value you can avoid all that
| since everything is already set up.
| sjtindell wrote:
| I appreciate your comment about pay. Recruiters will often
| tell me "it's finance so of course the pay will be
| substantial." Then when we get to talking numbers they're
| like "300k a year". Oh, you mean the going rate at a FAANG?
| And I have to move to New York or Chicago, work more hours,
| and actively work for people who I know are taking home
| paychecks with 7+ zeroes on them? Come on. Sometimes it's 400
| plus bonus or whatever, which is based on fund performance
| and yada yada. But it feels way off. I had heard so much
| about the staggering paydays at these places but it seems you
| need an ML PHD or some trading chops to be part of that.
| caffeine wrote:
| The attitude that finance pays more is a leftover from a
| previous era. 10-15 years ago it was true: the profits from
| HFT were so also way, way bigger and split up amongst a
| much smaller group of firms.
|
| Now those firms are all in a completely competitive
| industry squeezing each other for basis points.
|
| Meanwhile the definition of a FAANG is that it has an
| effective monopoly, and these companies are taking in way
| more money than the HFT industry. (Netflix is losing its
| monopoly but we can't really drop N from the acronym
| without a replacement..)
| 22SAS wrote:
| Tbf, most of us don't really prefer to be called as HFT's
| but as Market Makers. Different name, but we still use
| the same ultra low latency techniques to get the job
| done.
| spacemanmatt wrote:
| > but we can't really drop N from the acronym without a
| replacement
|
| Huh, yeah. That would be quite a GAAF. Gotta come up with
| something before Netflix is forced out of the FAANG club.
| snotrockets wrote:
| I've seen MAAM being used.
| gjs278 wrote:
| asjre34marakf wrote:
| Why pay more than market rate of a replaceable ML person?
|
| Is there any realistic path for a demonstrably smart and
| hardworking person into that 7+zeros club? Evidence
| suggests no: leetcode grinders and FAANGers are not in that
| club, and most of them will never even make it into the
| 6+zeros club. Net wealth -- sure, but not income.
| 22SAS wrote:
| It's all about making $$ for the firm. If the strategies
| developed are very profitable then 7-figures is
| definitely reachable for the researchers at a prop
| trading firm.
| isogon wrote:
| I cannot confirm this. ~300k is pay (excluding sign-on)
| fresh out of college at a big HFT -- sufficiently senior
| devs make 7 figures.
| hatesinterviews wrote:
| At our firm, the numbers are similar: $600k TC for new
| grads ($200k base, $100k minimum first year bonus, $300k
| signing bonus)
| 22SAS wrote:
| WTF! I am at an HFT firm in Chicago, this is insane. This
| seems to be a lot like an offer from Radix, or Headland,
| or maybe Algo Dev at HRT.
| isogon wrote:
| There is certainly much variance between the firms,
| especially the sign-on IME. People I know have turned
| down HRT core dev for big tech because their offers were
| unimpressive.
|
| I think an interesting target for comparison with big
| tech is Jane Street, since their culture and WLB are
| good, so the main QoL drawbacks of finance don't apply. A
| new grad will get ~300k at Jane Street, though probably
| not with this large a sign-on.
| 22SAS wrote:
| This is interesting, didn't know this about HRT Core Dev
| where offers were below FAANG. My understanding is that
| core devs are basically the folks who work on all the low
| latency stuff, so they'd be pretty well.
|
| Jane Street, from what I recall, is 300K (base + bonus)
| and 125K sign-on, and also it is non-negotiable. No idea
| what their numbers are like for experienced hires from
| competitors.
| 22SAS wrote:
| Honestly, that depends on the firm. There are same that
| do pay very well like this, eg: HRT, Jane Street (they
| are not an HFT though), Headlands, Radix. Some others
| like Jump, Optiver the pay varies depending on whether
| it's front office or back office.
|
| Where I work at, the new grad offers are slightly better
| than FAANG, but the growth is very good based on
| performance, we also pay very well to people coming in
| from a competitor.
| kevstev wrote:
| Yeah, pay at the big banks is shit really, especially when
| you consider the utter lack of work/life balance. I left in
| 2013 making 150k, which was supposed to be supplemented by
| a ~40% bonus for the level I was at, but each year was
| "well its been a tough year..." and after getting a token
| amount one year, and then zeroes the next 2, after working
| 50-60 hour weeks, I was like I am not only done with this
| place, but this industry, and left for a 50% pay raise, my
| TC is now 4x where it was in those days. A neighbor of mine
| is more or less sitting in my exact seat there, and is
| somewhere in the 200-250k range.
|
| That said, I went back to finance to work at one of the
| premier hedge funds out there, and they actually lived up
| to their expectations in terms of comp, that place was more
| like a tech firm though than any other firm I have ever
| worked at aside for maybe Knight. 8% annual increases were
| normal there. You can look in my post history back to 2018
| if you want the name, I recently left after 5 years there
| and just want to stay out of their crosshairs- they monitor
| social media aggressively and there is deferred comp at
| stake.
|
| At big banks, there are really only a very small number of
| people who are in tech that are getting paid- you have to
| know which questions to ask- where is the bonus pool coming
| from- are you "in the business" or the tech pool, which is
| a second class of citizen. I would have to be in a pretty
| bad place to ever consider going back to a bank, it was
| borderline abusive... always dangling the prospect of that
| big check that would make it worth it
| rosege wrote:
| I spent a few years at an investment bank, not in the US,
| and the only people on serious money were some of the top
| managers. But my overall opinion of these people were
| that they did very little but the lower downs I met were
| some of the most talented people I ever worked with.
|
| The top ones would spend all their time traveling the
| world to the offices and meeting with staff in each
| location and the sending emails to the rest of the
| department about what the staff in that location were
| working on. They would harvest ideas from the staff as
| they went and then present that as their own or approve
| projects that staff have suggested to them. I really
| didn't see how they were worth the $5M they were earning
| since they didn't come up with the ideas for what would
| be done and didn't do any real work.
| 22SAS wrote:
| Most quantitative hedge funds and prop trading firms are
| now following a very tech like culture since they realize
| now that technology is just as important as the
| strategies. To get the best engineers, especially from
| FAANG, they need to have a similar culture otherwise
| they'll have a hard time getting new hires.
| benjaminwootton wrote:
| I worked in a lot of front office groups in investment banking.
| The short spell I did in HFT had great software development and
| DevOps practices.
| idohft wrote:
| Hard to speak for HFT in general. Like in software, different
| firms have different levels of hygiene. About half of your
| bullet points were true of my previous employer, at my time of
| leaving.
| aledalgrande wrote:
| This is all basic stuff I look to set up in every team, and
| it's crazy given how these firms work directly with tons of
| money that they don't have an even higher standard. Guess I
| wasn't wrong turning down these roles.
| bnastic wrote:
| I remember the Knight Cap event, I was working on order routing
| at the time.
|
| Things have changed a lot since 2012, and at the same time
| haven't. Circuit breakers and position monitoring are no.1 in
| any sane market making firm. What happened then I can't imagine
| happening now (accumulating a huge position for, what was it,
| 30 minutes? With nobody killing the algos within a couple of
| minutes?). On the other hand, the perfect world of "code
| hygiene" and 100% test coverage will never exist in this world,
| things will slip and they do frequently. What's better,
| externally, is the availability of good tools for development
| and change reviews (bitbucket taking hold, for example),
| automated deployments, containers, testing frameworks and
| similar. This type of software, end to end, is incredibly
| complex and difficult to reason about when unexpected happens
| (there was a TTL misconfig for multicast and we never got such
| and such update? Well, no one thought of that!), esp these days
| with the influx of ML algos for price generation.
| 22SAS wrote:
| Currently work at an HFT firm. Most of the firms invest well
| into good DevOps, Trading Systems and SRE teams, to ensure that
| everything from installing a trading server at the colocation
| facility, to CI/CD and making changes to the systems configs,
| is done well. There are also guards in place to ensure that if
| the system seems to make trades that are way too odd then pull
| the plug and go down immediately.
|
| Also, any code that does not need to be there, is promptly
| removed right away.
|
| Where I work at, we have a few people from KCG i.e what was
| formed after Knight Capital merged with GETCO, after this
| incident. Sometimes this incident is bought up, although none
| of them I think ever worked for Knight Capital before this
| incident.
| bob1029 wrote:
| Repurposing feature flags is some kind of next dimension horror
| for me. We've got quite a few of these to deal with, and if
| someone started changing what they mean we'd be fucked super
| fast. Simply _suggesting_ that we alter the meaning of an
| existing FF would result in the resignation of a non-zero
| number of project managers on my team.
|
| Rolling back code is another thing I have no tolerance for
| anymore. The only option we entertain these days is a roll-
| forward. If your software takes so long to iterate/build that
| you need to go back to and old version in an emergency, you
| need to review your languages/tools/frameworks/processes. We
| maintain a contractual obligation to our customers for same-day
| code updates (in cases of production/regulatory emergencies)
| because we have enough confidence in our processes.
| robofanatic wrote:
| at the end .. its just money going from one account to another
| right? Its not like some physical thing that has perished and
| cant be brought back. Why is it difficult to reverse the
| transactions?
| ceejayoz wrote:
| Because those transactions cause other transactions, which
| cause others, and so on and so forth. You'd have to reset the
| market for the day.
|
| Imagine how pissed you'd be if you made money off Knight's
| mistake and it all just disappeared the next day.
| strgcmc wrote:
| Except, well, cancelling transactions obviously does happen,
| sometimes: https://www.reuters.com/business/lme-suspends-
| nickel-trading...
|
| Knight was probably too messy to rollback cleanly, but that
| just means it's a matter of cost/complexity/politics... if
| you're a big enough player, then the exchange will do you
| favors, like in the LME case.
|
| Free markets, lol
| bfm wrote:
| From the OP Rules were established after
| the "flash crash" of May 2010 to govern when trades should
| be canceled. Knight's buying binge did not drive up the
| price of the purchased stocks by more than 30 percent, the
| cancellation threshold, except for six stocks. Those
| transactions were reversed. In the other cases, the trades
| stood.
| ceejayoz wrote:
| > The LME announced that all trades will be voided from
| midnight until 8:15 a.m. on Tuesday when trading stopped
| and added that it was considering a closure of several
| days.
|
| > "People will be asking if this really a functioning
| market... This is meant to be a market of last resort and
| people can't get inventories to deliver against positions,"
| said Colin Hamilton, managing director of commodities
| research at BMO Capital Markets.
|
| There's gonna be a pretty high threshold for this sort of
| thing. Higher than "one company fucked up and wants a do-
| over".
| rubyskills wrote:
| This is much easier to do in a centralized futures market.
| I can't imagine a rollback in stocks being easy or
| possible.
| rubyskills wrote:
| Exactly this. If you're a market maker, likely your trades
| impact your own trades too. As you accumulate a position,
| your average price is going up with it. Trades should not
| just roll back because one large hedge fund screwed up.
| Imagine being a retail trader with that expectation. Would be
| nice!
| anamax wrote:
| > Why is it difficult to reverse the transactions?
|
| Why should the transactions be reversed?
|
| If things had gone according to plan, Knight would have made
| several million dollars that day, some likely because of a
| mistake by someone else or an unavoidable circumstance, just
| like it did on other days.
|
| Those other people weren't made whole, so why should Knight be
| any different?
| user3939382 wrote:
| Here's a 225 million dollar oopsie from 2005
| https://www.foxnews.com/story/typing-error-causes-225m-loss-...
| bfm wrote:
| Today there was a 300B oopsie in Europe caused by a Citibank
| "glitch" bloomberg.com/news/articles/2022-05-02/citi-s-london-
| trading-desk-behind-rare-european-flash-crash
| chmod775 wrote:
| It only was a sudden drop in share prices, which quickly
| rebounded. The amount of money that actually changed hands
| due to that mistake will be tiny in comparison.
| nuclearnice1 wrote:
| Here's the same oops in 2001
|
| https://www.wsj.com/articles/SB1007117680496415760
| gzer0 wrote:
| _The incident happened after a technician forgot to copy the new
| Retail Liquidity Program (RLP) code to one of the eight SMARS
| computer servers, which was Knight 's automated routing system
| for equity orders. RLP code repurposed a flag that was formerly
| used to activate an old function known as 'Power Peg'. Power Peg
| was designed to move stock prices higher and lower in order to
| verify the behavior of trading algorithms in a controlled
| environment. Therefore, orders sent with the repurposed flag to
| the eighth server triggered the defective Power Peg code still
| present on that server_ [1]
|
| > Power Peg was designed to move stock prices higher and lower in
| order to verify the behavior of trading algorithms in a
| controlled environment.
|
| This is insane. Make one wonder, what _is_ or _isn 't_ actually
| being deployed in prod in 2022.
|
| [1]
| https://en.wikipedia.org/wiki/Knight_Capital_Group#2012_stoc...
| codeulike wrote:
| _coder running down corridor to the trading room, bumping past
| people and sending sheaves of papers flying_
|
| "Power Peg has triggered! Tell them Power Peg has triggered!"
| bovermyer wrote:
| Interesting. Five years prior, this story was posted on this
| blog: https://dougseven.com/2014/04/17/knightmare-a-devops-
| caution...
| dang wrote:
| Related:
|
| _Knight Capital Says Trading Glitch Cost It $440 Million_ -
| https://news.ycombinator.com/item?id=4329101 - Aug 2012 (90
| comments)
| throwyawayyyy wrote:
| Random, but I interviewed at Knight Capital for a software
| engineering position a few weeks before this all went down. I was
| in London, so the interview was done over the phone. Picture me
| in the evening, handwriting C to solve some problem (the fog of
| time too thick to remember what that problem was), then reading
| out what I'd written, semicolons and all, to the interviewer.
| Because of course there was no shared doc. I did very badly. But
| then, so did they.
| coolhoody wrote:
| > handwriting C /.../ reading out what I'd written, semicolons
| and all, to the interviewer.
|
| I had to re-read it to make sure you are not joking. The fact
| that you were not made me laugh harder.
|
| I'm now just saying "retuuurn" in various exaggerated accents.
| nogridbag wrote:
| Same! Although I interviewed in person in their NYC office. I
| was very junior at the time and the team I interviewed with was
| awesome. I (luckily) didn't get the job. I did a few more
| interviews and accepted an offer from another company where I
| met my wife!
___________________________________________________________________
(page generated 2022-05-02 23:00 UTC)