[HN Gopher] Is something bugging you?
___________________________________________________________________
Is something bugging you?
Author : wwilson
Score : 824 points
Date : 2024-02-13 12:13 UTC (10 hours ago)
(HTM) web link (antithesis.com)
(TXT) w3m dump (antithesis.com)
| loadzero wrote:
| Sounds a bit like jockey applied to qemu. Very neat indeed.
|
| https://www.cs.purdue.edu/homes/xyzhang/spring07/Papers/HPL-...
| voidmain wrote:
| There's indeed a connection between record/replay and
| deterministic execution, but there's a difference worth
| mentioning, too. Both can tell you about the past, but only
| deterministic execution can tell you about alternate histories.
| And that's very valuable both for bug search (fuzzing works
| better) and for debugging (see for example the graphs where we
| show when a bug became likely to occur, seconds before it
| actually occurred).
|
| (Also, you won't be able to usefully record a hypervisor with
| jockey or rr, because those operate in userspace and the actual
| execution of guest code does not. You could probably record
| software cpu execution with qemu, but it would be slow)
|
| I'm a co-founder of Antithesis.
| pfdietz wrote:
| I assume deterministic execution also lets you do failing
| test case reduction.
|
| I've found this sort of high volume random testing w. test
| case reduction is just a game changer for compiler testing,
| where there's much the same effect at quickly flushing out
| newly introduced bugs.
|
| I like the subtle dig at type systems. :)
| loadzero wrote:
| I have been down this road a little bit, applying the ideas
| from jockey to write and ship a deterministic HFT system, so
| I have some understanding of the difficulties here.
|
| We needed that for fault tolerance, so we could have a hot
| synced standby. We did have to record all inputs (and outputs
| for sanity checking) though.
|
| We did also get a good taste of the debugging superpowers you
| mention in your blog article. We could pull down a trace from
| a days trading and replay on our own machines, and skip back
| and forth in time and find the root cause of anything.
|
| It sounds like what you have done is something similar, but
| with your own (AMD64) virtual machine implementation, making
| it fully deterministic and replayable, and providing useful
| and custom hardware impls (networking, clock, etc).
|
| That sounds like a lot of hard but also fun work.
|
| I am missing something though, in that you are not using it
| just for lockstep sync or deterministic replays, but you are
| using it for fuzzing. That is, you are altering the replay
| somehow to find crashes or assertion failures.
|
| Ah, I think perhaps you are running a large number of sims
| with a different seed (for injecting faults or whatnot) for
| your VM, and then just recording that seed when something
| fails.
| Rygian wrote:
| The writing is really enjoyable.
|
| > Programming in this state is like living life surrounded by a
| force field that protects you from all harm. [...] We deleted all
| of our dependencies (including Zookeeper) because they had bugs,
| and wrote our own Paxos implementation in very little time and it
| _had no bugs_.
|
| Being able to make that statement and back it by evidence must be
| indeed a cool thing.
| llm_trw wrote:
| I have proved my code has no bugs according to the spec.
|
| I do not make the claim my spec has no bugs.
| yasuocidal wrote:
| "Its not a bug, its a feature"
| coldtea wrote:
| With formal proof systems, you can also claim that for your
| spec.
| svieira wrote:
| A formal proof is only as good as what-you-are-proving maps
| to what-you-intended-to-prove.
| AlotOfReading wrote:
| I've written formal proofs with bugs more than once.
| Reality is much messier than you can encode into any proof
| and there will ultimately be a boundary where the _real_
| systems you 're trying to build can still have bugs.
|
| Formal verification is incredibly, amazingly good if you
| achieve it, but it's not the same as "perfect".
| llm_trw wrote:
| No you can't.
|
| You can claim that your spec doesn't violate some
| invariants in a finite number of steps, you can't claim
| that the spec contains all the invariants the real system
| must have and that it doesn't violate them in number of
| steps + 1.
| btrettel wrote:
| The earliest that I've seen the attitude that one should
| eliminate dependencies because they have more bugs than
| internally written code was this book from 1995:
| https://store.doverpublications.com/products/9780486152936
|
| pp. 65-66:
|
| > The longer I have computed, the less I seem to use Numerical
| Software Packages. In an ideal world this would be crazy; maybe
| it is even a little bit crazy today. But I've been bitten too
| often by bugs in those Packages. For me, it is simply too
| frustrating to be sidetracked while solving my own problem by
| the need to debug somebody else's software. So, except for
| linear algebra packages, I usually roll my own. It's
| inefficient, I suppose, but my nerves are calmer.
|
| > The most troubling aspect of using Numerical Software
| Packages, however, is not their occasional goofs, but rather
| the way the packages inevitably hide deficiencies in a
| problem's formulation. We can dump a set of equations into a
| solver and it will usually give back a solution without
| complaint - even if the equations are quite poorly conditioned
| or have an unsuspected singularity that is distorting the
| answers from physical reality. Or it may give us an alternative
| solution that we failed to anticipate. The package helps us
| ignore these possibilities - or even to detect their occurrence
| if the execution is buried inside a larger program. Given our
| capacity for error-blindness, software that actually hides our
| errors from us is a questionable form of progress.
|
| > And if we do detect suspicious behavior, we really can't dig
| into the package to find our troubles. We will simply have to
| reprogram the problem ourselves. We would have been better off
| doing so from the beginning - with a good chance that the
| immersion into the problem's reality would have dispelled the
| logical confusions before ever getting to the machine.
|
| I suppose whether to do this depends on how rigorous one is,
| how rigorous certain dependencies are, and how much time one
| has. I'm not going to be writing my own database (too
| complicated, multiple well-tested options available) but if I
| only use a subset of the functionality of a smaller package
| that isn't tested well, rolling my own could make sense.
| voidmain wrote:
| In the specific case in question, the biggest problem was
| that dependencies like Zookeeper weren't compatible with our
| testing approach, so we couldn't do true end to end tests
| unless we replaced them. One of the nice things about
| Antithesis is that because our approach to deterministic
| simulation is at the whole system level, we can do it against
| real dependencies if you can install them.
|
| I was a co-founder of both FoundationDB and Antithesis.
| spinningD20 wrote:
| That tracks well (both the quotes and your thoughts).
|
| One example that comes to mind where I want to roll my own
| thing (and am in the process of doing so) is replacing our
| ci/cd usage of jenkins that is solely for running qa
| automation tests against PR's on github. Jenkins does way way
| more than we need. We just need github PR
| interaction/webhook, secure credentials management, and
| spawning ecs tasks on aws...
|
| Every time I force myself to update our jenkins instance, I
| buckle up because there is probably some random plugin, or
| jenkins agent thing, or ... SOMETHING that will break and
| require me to spend time tracking down what broke and why.
| 100% surface area for issues, whilst we use <5% of what
| Jenkins actually provides.
| flgstnd wrote:
| the palantir testimonial on the landing page is funny
| CiPHPerCoder wrote:
| Even funnier if you manage to click "Declassify" :)
| flgstnd wrote:
| you're ip address is probably in the palantir databases
| anyway :o
| zellyn wrote:
| And if you highlight the redactions, it reads:
|
| REDACTED REDACTED REDACTED REDACTED REDACTED REDACTED and
| REDACTED REDACTED? REDACTED REDACTED Antithesis REDACTED
| REDACTED REDACTED REDACTED, REDACTED REDACTED REDACTED
| REDACTED. REDACTED REDACTED Palantir REDACTED REDACTED REDACTED
| REDACTED REDACTED REDACTED REDACTED.
|
| :-)
| couchand wrote:
| This sort of awkward joke made to cover for capitalist illogic
| makes us all dumber.
| larsiusprime wrote:
| Was an eaaaaaaaarly tester for this. Pretty neat stuff.
| indiv0 wrote:
| I've been super interested in this field since finding out about
| it from the `sled` simulation guide [0] (which outlines how
| FoundationDB does what they do).
|
| Currently bringing a similar kind of testing in to our workplace
| by writing our services to run on top of `madsim` [1]. This lets
| us continue writing async/await-style services in tokio but then
| (in tests) replace them with a deterministic executor that
| patches all sources of non-determinism (including dependencies
| that call out to the OS). It's pretty seamless.
|
| The author of this article isn't joking when they say that the
| startup cost of this effort is monumental. Dealing with every
| possible source of non-determinism, re-writing services to be
| testable/sans-IO [2], etc. takes a lot of engineering effort.
|
| Once the system is in place though, it's hard to describe just
| how confident you feel in your code. Combined with tools like
| quickcheck [3], you can test hundreds of thousands of subtle
| failure cases in I/O, event ordering, timeouts, dropped packets,
| filesystem failures, etc.
|
| This kind of testing is an incredibly powerful tool to have in
| your toolbelt, if you have the patience and fortitude to invest
| in it.
|
| As for Antithesis itself, it looks very very cool. Bringing the
| deterministic testing down the stack to below the OS is awesome.
| Should make it possible to test entire systems without wiring up
| a harness manually every time. Can't wait to try it out!
|
| [0]: https://sled.rs/simulation.html
|
| [1]: https://github.com/madsim-rs/madsim?tab=readme-ov-
| file#madsi...
|
| [2]: https://sans-io.readthedocs.io/
|
| [3]: https://github.com/BurntSushi/quickcheck?tab=readme-ov-
| file#...
| michael_j_ward wrote:
| > Dealing with every possible source of non-determinism, re-
| writing services to be testable/sans-IO [2], etc. takes a lot
| of engineering effort.
|
| Are there public examples of what such a re-write looks like?
|
| Also, are you working at a rust shop that's developing this
| way?
|
| Final Note, TigerBeetle is another product that was written
| this way.
| wwilson wrote:
| TigerBeetle is actually another customer of ours. You might
| ask why, given that they have their own, very sophisticated
| simulation testing. The answer is that they're so fanatical
| about correctness, they wanted a "red team" for their own
| fault simulator, in case a bug in their tests might hide a
| bug in their database!
|
| I gotta say, that is some next-level commitment to writing a
| good database.
|
| Disclosure: Antithesis co-founder here.
| indiv0 wrote:
| Sure! I mentioned a few orthogonal concepts that go well
| together, and each of the following examples has a different
| combination that they employ:
|
| - the company that developed Madsim (RisingWave) [0] [1] is
| tries hardest to eliminate non-determinism with the broadest
| scope (stubbing out syscalls, etc.)
|
| - sled [2] itself has an interesting combo of deterministic
| tests combined with quickcheck+failpoints test case auto-
| discovery
|
| - Dropbox [3] uses a similar approach but they talk about it
| a bit more abstractly.
|
| Sans-IO is more documented in Python [4], but str0m [5] and
| quinn-proto [6] are the best examples in Rust I'm aware of.
| Note that sans-IO is orthogonal to deterministic test
| frameworks, but it composes well with them.
|
| With the disclaimer that anything I comment on this site is
| my opinion alone, and does not reflect the company I work at
| ---- I do work at a rust shop that has utilized these
| techniques on some projects.
|
| TigerBeetle is an amazing example and I've looked at it
| before! They are really the best example of this approach
| outside of FoundationDB I think.
|
| [0]: https://risingwave.com/blog/deterministic-simulation-a-
| new-e...
|
| [1]: https://risingwave.com/blog/applying-deterministic-
| simulatio...
|
| [2]: https://dropbox.tech/infrastructure/-testing-our-new-
| sync-en...
|
| [3]: https://github.com/spacejam/sled
|
| [4]: https://fractalideas.com/blog/sans-io-when-rubber-meets-
| road...
|
| [5]: https://github.com/algesten/str0m
|
| [6]: https://docs.rs/quinn-
| proto/0.10.6/quinn_proto/struct.Connec...
| Voultapher wrote:
| > you can test hundreds of thousands of subtle failure cases in
| I/O, event ordering, timeouts, dropped packets, filesystem
| failures, etc.
|
| As cool as all this is, I can't stop but wonder how often the
| culture of micro-services and distributed computing is ill
| advised. So much complexity I've seen in such systems boils
| down to calling a "function" is: async, depends on the OS, is
| executed at some point or never, always returns a bunch of
| strings that need to be parsed to re-enter the static type
| system, which comes with its own set of failure modes. This
| makes the seemingly simple task of abstracting logic into a
| named component, aka a function, extremely complex. You don't
| need to test for any of the subtle failures you mentioned if
| you leave the logic inside the same process and just call a
| function. I know monoliths aren't always a good idea or fit, at
| the same time I'm highly septical whether the current
| prevalence of service based software architectures is justified
| and pays off.
| pfdietz wrote:
| This sounds quite cool. Although it doesn't say so, I imagine the
| name is riff off Hypothesis, the testing tool that performs
| automatic test case simplification in a general way.
| intuitionist wrote:
| (I'm an early employee of what was, on my start date, just
| called "Void Star.")
|
| As I recall it, the name meant two things:
|
| 1. Our "autonomous testing" approach is the opposite, or the
| antithesis, of flaky and unreliable testing methodologies.
|
| 2. You can think of our product as standing in dialectical
| opposition to buggy customer software, pointing out its
| internal contradictions (bugs) and together synthesizing a new,
| bug-free software product. (N.b.: I've never actually read
| Hegel.)
|
| We did note the resonance with Hypothesis (a library I like a
| lot!) at the time, but it was just an added bonus :).
| vinnymac wrote:
| I wonder if they are working on a time travel debugger. If it is
| truly deterministic presumably you could visit any point in time
| after a record is made and replay it.
| wwilson wrote:
| No comment. :-)
|
| Disclosure: I am a co-founder of Antithesis.
| bloopernova wrote:
| It looks amazing, nice work!
|
| Do you have any plans to let small open source teams use the
| project for free? Obviously you have bills to pay and your
| customers are happy to do that, but I was wondering if you'd
| allow open source projects access to your service once a week
| or something.
|
| Partly because I want to play with this and I can't see my
| employer or client paying for it! But also it fits neatly
| into "DX", the Developer Experience, i.e. making the
| development cycle as friction free for devs as possible. I'm
| DevOps with a lifelong interest in UX, so DX is something I'm
| excited about.
| wwilson wrote:
| Pricing suitable for small teams, and perhaps even a free
| tier, is absolutely on the roadmap. We decided to build the
| "hard", security-obsessed version of the infrastructure
| first -- single-tenant, with dedicated and physically
| isolated hardware and networking for every customer. That
| means there's a bit of per-customer overhead that we have
| to recoup.
|
| In the future, we will probably have a multi-tenant
| offering that's easier for open source projects to adopt.
| In the meantime, if your project is cool and would benefit
| from our testing, you can try to get our research team
| interested in using it as part of the curriculum that makes
| our platform smarter.
|
| Disclosure: I'm an Antithesis co-founder.
| nlavezzo wrote:
| We've actually done quite a bit of testing on open source
| projects as we've built this, and have discussed doing an
| on-going program of testing open source projects that have
| interested contributors. We'd probably find some
| interesting things and could do some write-ups. Reach out
| to us via our contact page or contact@antithesis.com and
| let's chat.
| _dain_ wrote:
| [I work at Antithesis]
|
| The system can certainly revisit a previous simulated moment
| and replay it. And we have some pretty cool things using that
| capability as a primitive. Check out the probability chart in
| the bug report linked from the demo page:
| https://antithesis.com/product/demo
| xbar wrote:
| Now I want a simulation-run replay scrubbing slider MIDI-
| connected to my Pioneer DJ rig to scratch through our
| troublesome tests as my homies push patched containers.
|
| Seriously: impressive product revelation.
| wwilson wrote:
| Let's do it.
| ismailmaj wrote:
| That's exactly what Tomorrow Corporation uses for their hand
| written game engine and compiler:
| https://www.youtube.com/watch?v=72y2EC5fkcE
| rdtsc wrote:
| That's what rr-project does essentially?
| acemarke wrote:
| Exactly - that's what we've already built for web development
| at https://replay.io :)
|
| I did a "Learn with Jason" show discussion that covered the
| concepts of Replay, how to use it, and how it works:
|
| - https://www.learnwithjason.dev/travel-through-time-to-
| debug-...
|
| Not only is the debugger itself time-traveling, but those time-
| travel capabilities are exposed by our backend API:
|
| - https://static.replay.io/protocol/
|
| Our entire debugging frontend is built on that API. We've also
| started to build new advanced features that leverage that API
| in unique ways, like our React and Redux DevTools integration
| and "Jump to Code" feature:
|
| - https://blog.replay.io/how-we-rebuilt-react-devtools-with-
| re...
|
| - https://blog.isquaredsoftware.com/2023/10/presentations-
| reac...
|
| - https://github.com/Replayio/Protocol-Examples
| chrispy513 wrote:
| This looks to be an incredible tool that was years in the making.
| Excited to see where it goes from here!
| User23 wrote:
| Reminds me of the clever hack of playing back TCP dump logs from
| prod on a test network, but dialed up. Neat.
|
| Naturally I'd prefer professional programmers learn the cognitive
| tools for manageably reasoning about nondeterminism, but they've
| been around over half a century and it hasn't happened yet.
|
| What's really interesting to me is that the simulation adequately
| replicates the real network. One of the more popular criticisms
| of analytical approaches is sone variant of: yeah, but the real
| network isn't going to behave like your model. Which by the way
| is an entirely plausible concern for anyone who has messed with
| that layer.
| Rygian wrote:
| What is interesting here is that the solution could fuzz-test
| anything, including the network model, leading to failures even
| more implausible than reality.
| zamfi wrote:
| > Naturally I'd prefer professional programmers learn the
| cognitive tools for manageably reasoning about nondeterminism
|
| It's not an either-or here, though. Part of the challenge is
| you're not always thinking about all the non-determinisms in
| your code, and the interconnections between your code and other
| code (whose behavior you can sometimes only _assume_ ) can make
| that close to impossible. Part of that is the "your model of
| the network" critique, but also part of that is "your model of
| how people will use your software" isn't necessarily correct
| either.
| kretaceous wrote:
| This might be the best introduction post I've read.
|
| Lays the foundation (get it?) for who the people are and what
| they've built.
|
| Then explains how the current thing they are building is a result
| of the previous thing. It feels that they actually want this
| problem solved for everyone because they have experienced how
| good the solution feels.
|
| Then tells us about the teams (pretty big names with complex
| systems) that have already used it.
|
| All of these wrapped in good writing that appeals to
| developers/founders. Landing page is great too!
| getoffmycase wrote:
| The entire testing system they describe feels like something I
| can strive towards too. They make you want their solution
| because it offers a way of life and thinking and doing like
| you've never experienced before
| foobarqux wrote:
| Except it doesn't actually explain in what it does: Is it
| fuzzing? Do you supply your own test cases? Is it testing
| hardware non-determinism?
| Aeolun wrote:
| Yeah. I could figure out the global idea, but then the
| mechanics of how it would actually work were very sparse.
| wwilson wrote:
| Post author here. Sorry it was vague, but there's only so
| much detail you can go into in a blog post aimed at general
| audiences. Our documentation (https://antithesis.com/docs/)
| has a lot more info.
|
| Here's my attempt at a more complete answer: think of the
| story of the blind men and the elephant. There's a thing,
| called fuzzing, invented by security researchers. There's a
| thing, called property-based testing, invented by functional
| programmers. There's a thing, called network simulation,
| invented by distributed systems people. There's a thing,
| called rare-event simulation, invented by physicists (!). But
| if you squint, all of these things are really the same kind
| of thing, which we call "autonomous testing". It's where you
| express high-level properties of your system, and have the
| computer do the grunt work to see if they're true. Antithesis
| is our attempt to take the best ideas from each of these
| fields, and turn them into something really usable for the
| vast majority of software.
|
| We believe the two fundamental problems preventing widespread
| adoption of autonomous testing are: (1) most software is non-
| deterministic, but non-determinism breaks the core feedback
| loop that guides things like coverage-guided fuzzing. (2) the
| state space you're searching is inconceivably vast, and the
| search problem in full generality is insolubly hard.
| Antithesis tries to address both of these problems.
|
| So... is it fuzzing? Sort of, except you can apply it to
| whole interacting networked systems, not just standalone
| parsers and libraries. Is it property-based testing? Sort of,
| except you can express properties that require a "global"
| view of the entire state space traversed by the system, which
| could never be locally asserted in code. Is it fault
| injection or chaos testing? Sort of, except that it can use
| the techniques of coverage guided fuzzing to get deep into
| the nooks and crannies of your software, and determinism to
| ensure that every bug is replayable, no matter how weird it
| is.
|
| It's hard to explain, because it's hard to wrap your arms
| around the whole thing. But our other big goal is to make all
| of this easy to understand and easy to use. In some ways,
| that's proved to be even harder than the very hard
| technological problems we've faced. But we're excited and up
| for it, and we think the payoff could be big for our whole
| industry.
|
| Your feedback about what's explained well and what's
| explained poorly is an important signal for us in this third
| very hard task. Please keep giving it to us!
| jldugger wrote:
| I remember watching the Strange Loop video on your testing
| strategy, and now I need to go back and relearn how it
| differed from model checking (ie Promela or TLA+). Model
| checking is probably the big QA story that tech companies
| ignore because it requires dramatically more education,
| especially from QA departments typically seen as "inferior"
| to SWE.
| rhodin wrote:
| Video of [0] the Strangeloop talk [1].
|
| [0] https://www.youtube.com/watch?v=4fFDFbi3toc [1]
| https://thestrangeloop.com/2014/testing-distributed-
| systems-...
| randomdata wrote:
| _> most software is non-deterministic_
|
| Doesn't Antithesis rely on the fact that software is always
| deterministic? Reproducibility appears to be its top
| selling feature - something that wouldn't be possible if
| software were non-deterministic.
| wwilson wrote:
| We can force any* software to be deterministic.
|
| * Offer only good for x86-64 software that runs on Linux
| whose dependencies you can install locally or mock. The
| first two restrictions we will probably relax someday.
| randomdata wrote:
| Aren't you just 'forcing' determinism in the inputs,
| relying on the software to be always deterministic for
| the same inputs?
| wwilson wrote:
| Nope. We're emulating a deterministic computer, so your
| software can't act nondeterministically if it tries.
| randomdata wrote:
| Right, by emulating a deterministic computer you can
| ensure that the inputs to the software are always
| deterministic - something traditional computing
| environments are unable to offer for various reasons.
|
| However, if we pretend that software was somehow able to
| be non-deterministic, it would be able to evade your
| deterministic computer. But since software is always
| deterministic, you just have to guarantee determinism in
| the inputs.
| _dain_ wrote:
| [I work at Antithesis]
|
| _> But since software is always deterministic, you just
| have to guarantee determinism in the inputs._
|
| This is technically correct, but that's a very load-
| bearing "just". A _lot_ of things would have to count as
| inputs. Think about execution time, for example. CPUs don
| 't execute at the same speed all the time because of
| automatic throttling. Network packets have different
| flight times. Threads and processes get scheduled a
| little differently. In distributed/concurrent systems,
| all this matters. If you run the same workload twice,
| observable events will happen at different times and in
| different orders because of tiny deviations in initial
| conditions.
|
| So yes, if you consider the time it takes to run every
| single machine instruction as an "input", then software
| is deterministic given the same inputs. But in the real
| world that's not actionable. Even if you had all those
| inputs, how are you going to pass them in? For all
| intents and purposes most software execution is non-
| deterministic.
|
| The Antithesis simulation _is_ deterministic in this way
| though. It is in charge of how long _everything_ takes in
| "simulated time", right down to the running times of
| individual CPU instructions. Everything observable from
| within the simulation happens the exact same way, every
| time. You can compare a memory dump at the same
| (simulated) instant across two different runs and they
| will be bit-for-bit identical.
| randomdata wrote:
| _> Think about execution time, for example._
|
| Sure. A good example. Execution time - more accurately,
| execution speed - isn't a property of software. For
| example, as you point out yourself, you can alter the
| execution speed without altering the software. It is,
| indeed, an input.
|
| _> Even if you had all those inputs, how are you going
| to pass them in?_
|
| Well, we know how to pass them in non-deterministically.
| That's how software is able to do anything.
|
| Perhaps one could create a simulated environment that is
| able to control all the inputs? In fact, I'm told there
| is a company known as Antithesis working on exactly that.
| mlhpdx wrote:
| Oh, that sounds like a challenge...
|
| Is the challenge here the same as with digital
| simulations of electronic circuits? That is, at the end
| of the day analog physics becomes confounding? Or are you
| doing deterministic simulation of random RF noise as
| well?
| pokler wrote:
| That point about dependencies -- how well does this play
| or easy to integrate with a build system like Bazel or
| Buck?
| crdrost wrote:
| This vaguely reminds me of Jefferson's "Virtual Time" paper
| from 1985[1]. The underlying idea at the time didn't really
| take off because it required, like Zookeeper, a greenfield
| project: except that it kinda doesn't and today you could
| imagine instrumenting an entire Linux syscall table and
| letting any Linux container become a virtual time system --
| but Linux didn't exist in 1985 and wouldn't be standard
| until much later.
|
| So Jefferson just says, let's take your I/O-ful process,
| split it a message-passing actor model, and monitor all the
| messages going in and coming out. The messages coming out,
| they won't necessarily _do what they 're supposed to do_
| yet, they'll just be recorded with a plus sign and a
| virtual timestamp, and by assumption eventually you'll
| block on some response. So we have a bunch of recorded
| message timestamps coming in, we have your recorded
| messages going out.
|
| Well, there's a problem here, which is that if we have
| multiple actors we may discover that their timestamps have
| traveled out-of-order. You sent some message at t=532 but
| someone actually sent you a message at t=231 that you might
| have selected instead of whatever you actually selected to
| send the t=532 message. (For instance in the OS case, they
| might have literally sent a SIGKILL to your process and you
| might not have sent anything after that.) That's what the
| plus sign is for, indirectly: we can restart your process
| from either a known synchronization state or else from the
| very beginning, we know all of its inputs during its first
| run so we have "determinized" it up past t=231 to see what
| it does now. Now, it sends a new message at say t=373. So
| we use the opposite of +, the minus sign, to send to all
| the other processes the "undo" message for their t=532
| message, this removes it from their message buffer: that
| will never be sent to them. And if they haven't hit that
| timestamp in their personal processing yet, no further
| action is needed, otherwise we need to roll them back too.
| Doing so you determinize the whole networked cluster.
|
| The only other really modern implementation of these older
| ideas that I remember seeing was Haxl[2], a Haskell library
| which does something similar but rather than using a
| virtual time coordinate, it just uses a process-local
| cache: when you request any I/O, it first fetches from the
| cache if possible and then if that's not possible it goes
| out, fetches the data, and then caches it. As a result you
| can just offer someone a pre-populated cache which, with
| these recorded inputs, will regenerate the offending stack
| trace deterministically.
|
| 1: https://dl.acm.org/doi/10.1145/3916.3988
|
| 2: https://github.com/facebook/Haxl
| kodablah wrote:
| Has any thought been given to repurposing this
| deterministic computer for more than just autonomous
| testing/fuzzing? For example, given an ability to
| record/snapshot the state, resumable software (i.e. durable
| execution)?
| wwilson wrote:
| Somebody once suggested to me that this could be very
| hand for the reproducible builds folks. I'm sure that now
| that we're out in the open, lots of people will suggest
| great applications for it.
|
| Disclosure: Antithesis co-founder.
| cperciva wrote:
| My favourite application for "deterministic computer" is
| creating a cluster in order to have a virtual machine
| which is resilient to hardware failure. Potentially even
| "this VM will keep running even if an entire AWS region
| goes down" (although that would add significant latency).
| ajb wrote:
| This is interesting - it is kind of picking a fight with
| SaaS/cloud providers though, as that is the one kind of
| software you won't be able to import into your environment:
| not because it can't do the job, but because you don't have
| the code. So this would create an incentive to go back to
| PaaS.
|
| It's definitely true though that a big problem with backend
| is that you can't easily treat it as a whole system for
| test purposes.
| pkghost wrote:
| > it is kind of picking a fight with SaaS/cloud providers
|
| or starting a bidding war
| ajb wrote:
| how so?
| EasyMark wrote:
| thanks, I'll dig in. I'm a very visual person and
| charts/diagrams/flows always help my grasp of something
| more than a wall of text. Maybe include some of those in
| there when you get the time?
| criddell wrote:
| > turn them into something really usable for the vast
| majority of software
|
| Would it work for debugging, say, Notepad on Windows?
| amw-zero wrote:
| Is there more info on how Antithesis solves problem number
| 2 (large state spaces)? I understand the fuzzing / workload
| generation part well, but there's so many different state
| space reduction techniques that I don't know what
| Antithesis is doing under the hood to combat that.
| gitgud wrote:
| _> Your feedback about what 's explained well and what's
| explained poorly is an important signal for us in this
| third very hard task. Please keep giving it to us!_
|
| It's hard to understand these complex concepts via language
| alone.
|
| Diagrams would be a huge help to understand how this system
| of testing works compared to existing testing concepts
| kretaceous wrote:
| Sure, it doesn't go into details. And that is exactly why I
| termed it an excellent _introduction_ and a sales pitch.
|
| I haven't heard of deterministic testing before. Nor have I
| heard of FoundationDB or the related things. And I went from
| knowing zero things about them to getting impressed and
| interested. This led me to go into their docs, blogs, landing
| page, etc. to know more.
| k__ wrote:
| Did you read a different article than me?
|
| The linked article is 3/4 about some history and rationale
| before it actually tells you what they build.
|
| It's like those pesky recipe blogs that tell you about the
| authors childhood, when you just want to make vegan pancakes.
| chinchilla2020 wrote:
| It seems like marketing copy. Not a technical blog post.
|
| It would be nice to see some actual use cases and examples.
|
| Instead, the writer just name-dropped a few big companies and
| claimed to have a revolutionary product that works magically.
| Then include the typical buzzwords like '10x programmer' and
| 'stealth mode'. The latter doesn't make sense because they also
| name-drop clients.
| sneak wrote:
| Imagine being proud of working for Palantir.
| mgfist wrote:
| Your life depends on lots of unsavory tasks.
| sneak wrote:
| Yes, like sewage pipe maintenance. Not data mining to figure
| out who to assassinate without trial.
|
| Using the "unsavory" euphemism for unethical and illegal
| violence is somewhat of a deception, is it not?
| jitl wrote:
| To me this is very reminiscent of time travel debugging tools
| like the one used for Firefox's C++ code, rr / Pernosco:
| https://pernos.co/
| rvnx wrote:
| Seems more like a fuzzer for Docker images.
|
| Like this:
| https://docs.gitlab.com/ee/user/application_security/coverag...
|
| It won't tell you whether the software works correctly, it will
| just tell you if it raises an exception or crashes.
|
| Put a fuzzer on Chrome for example, you won't catch most of the
| issues it has, though Chrome actually has tons of bugs and
| issues, but you _may_ find security issues if you devote a big
| enough budget to run your fuzzer long time enough to cover all
| the branches.
|
| So it's good in the case where you use "exceptions as tests",
| where any minor out-of-scope behavior raises an exception and
| all the cases are pre-planned (a bit like you baked-in runtime
| checks, and the fuzzer explores them)
| jitl wrote:
| The similarity is about obtaining determinism through
| something like a hypervisor. The way rr works is it basically
| writes down the result of all the system calls, etc,
| basically everything that ended up on the Turing machine's
| tape, so you can rewind and replay.
| intrasight wrote:
| >It's pretty weird for a startup to remain in stealth for over
| five years.
|
| Not really. I have friends who work for a startup that's been in
| "stealth" for 20 years. Stealth is a business model not a phase.
| jimbokun wrote:
| > The biggest effect was that it gave our tiny engineering team
| the productivity of a team 50x its size.
|
| I feel like the idea of the legendary "10x" developer has been
| bastardized to just mean workers who work 15 hours a day 6.5 days
| a week to get something out the door until they burn out.
|
| But here's your real 10x (or 50x) productivity. People who
| implement something very few people even considered or understood
| to be possible, which then gives amazing leverage to deliver
| working software in a fraction of the time.
| FirmwareBurner wrote:
| Your definition is also vague. Someone still needs to do the
| legwork. One man armies who can do everything themselves don't
| really fit in standardized teams where everything is
| compartmentalized and work divided and spread out.
|
| They work best on their own projects with nobody else in their
| way, no colleagues, no managers, but that's not most jobs. Once
| you're part of a team, you can't do too much work yourself no
| matter how good you are, as inevitably the other slower/weaker
| team members will slow you down as you'll fight dealing with
| the issues they introduce into the project or the issues from
| management, so every team moves at the speed of the lowest
| common denominator no matter their rockstars.
| jollyllama wrote:
| That rings true and is probably why the 10x engineers I have
| seen usually work on devops or modify the framework the other
| devs are using in some way. For example, an engineer who
| speeds up a build or test suite by an order of magnitude is
| easily a 10x engineer in most organizations, in terms of man
| hours saved.
| FirmwareBurner wrote:
| _> For example, an engineer who speeds up a build or test
| suite by an order of magnitude is easily a 10x engineer in
| most organizations, in terms of man hours saved._
|
| Yeah but this isn't something scalable that can happen
| regularly as part of your job description. Like most
| jobs/companies don't have so many low hanging fruits to
| pick that someone can speed of build by orders of magnitude
| on a weekly basis. It's usually a one time thing. And one
| time things don't usually make you a 10x dev. Maybe you
| just got lucky once to see something others missed.
|
| And often times at big places most people know where the
| low hanging fruits are and can fix them, but management,
| release schedules and tech debt are perpetually in the way.
|
| IMHO what makes you a 10x dev is you always know how to
| unblock people no matter the issue so that the project is
| constantly smooth saling, not chasing orders of magnitude
| improvements unicorns.
| tranceylc wrote:
| Does anyone else feel like people follow these sort of
| industry pop-culture terms a bit too intensely? What I
| mean is that the existence of the term tends to bring out
| people trying to figure who that might be, as if it has
| to be 100% true.
|
| I personally think that some people can provide "10x"
| (arbitrary) the value on occasion, like the low hanging
| fruit you said. I also believe some people are slightly
| more skilled than others, and get more results out of
| their work. That said, there are so many ways for
| somebody to have an impact that doesn't have to
| immediate, that I find the term itself too prevalent.
| lukan wrote:
| "Does anyone else feel like people follow these sort of
| industry pop-culture terms a bit too intensely? "
|
| Agreed, there is too much effort going into the
| "superstars" theme, but there are definitely people who
| get 10x done in the same time as others.
| t-3 wrote:
| Yep. No matter what you're doing, some people are more
| productive than others. Often it's a matter of experience
| and practice, sometimes ability to focus, sometimes
| motivation, rarely it's a lack or surplus of inherent
| ability. Using people effectively in the context of a
| team all depends on the skill of the manager though.
| jollyllama wrote:
| It really does depend on where you work. The order of
| magnitude improvements I'm describing involved
| interdisciplinary expertise involving both bespoke
| distributed build systems and assembly language. They're
| not unicorns, they do exist, but they are very rare and
| most engineers just aren't going to be able to find them,
| even with infinite time. Hence why a 10x engineer is so
| valuable and not everyone can be one. I myself am
| certainly not one, in most contexts.
| vdqtp3 wrote:
| > Like most jobs/companies don't have so many low hanging
| fruits to pick that someone can speed of build by orders
| of magnitude on a weekly basis
|
| You and I have worked at very different organizations.
| Everywhere I've been has had insane levels of
| inefficiency in literally every process.
| ejb999 wrote:
| same here - it is especially bad in huge companies, the
| inefficiencies and waste are legendary.
| FirmwareBurner wrote:
| _> insane levels of inefficiency in literally every
| process._
|
| In processes yes, not in code, and solo 10x devs alone
| can't fix broken processes as those are a the effect of
| broken management and engineering culture.
|
| People know where the inefficiencies are, but management
| doesn't care.
| theamk wrote:
| Nothing wrong with "one man armies" in the team context.
| There is a long list of tasks that needs to be done.. over
| same time period, one person will do 5 complex tasks (with
| tests and documentation), while the other will do just 1
| task, and then spend even more time redoing it properly.
|
| Over time this produces funny effects, like super-big 20
| point task done in few days because wrong person started
| working on it.
| giantg2 wrote:
| I'm tired of hearing about 10x engineers. I just want to be a
| good 1x engineer. Or good at anything in life realy.
| datameta wrote:
| The truest 10x engineer I ever encountered was a memory
| firmware guy with ASIC experience who absolutely made sure to
| log off at 5 every day after really putting in the work. Go
| to guy for all parts of the codebase, even that which he
| didn't expressly touch.
| harryvederci wrote:
| > I'm tired of hearing about 10x engineers.
|
| "The truest 10x engineer I ever encountered was..."
| JimDabell wrote:
| The "10x engineer" comes from the observation that there is a
| 10x difference in productivity between the best and the worst
| engineers. By saying that you want to be a 1x engineer,
| you're saying you want to be the least productive engineer
| possible. 1x is not the average, 1x is the worst.
| mathgradthrow wrote:
| the worst engineer certainly has negative productivity, so
| I'm not sure that your explanation can possibly be the
| correct one.
| JimDabell wrote:
| I'm explaining what the terms "10x" and "1x" mean, not
| asserting that the original observation is correct under
| all circumstances.
| mathgradthrow wrote:
| i believe the original was for an entire "organizations"
| performance, and was also done in 1977. Since they are
| averages, It makes "sense" to conclude that the best of a
| good team is 10x better than the average of the worst
| team. Not really what the experimwnt concludes but what
| can you do.
| JimDabell wrote:
| The first was 1968, but there have been more studies
| since.
|
| https://www.construx.com/blog/the-origins-of-10x-how-
| valid-i...
| randomdata wrote:
| Except you haven't explained it at all. Sackman,
| Erickson, and Grant found that some developers were able
| to complete what was effectively a programming contest in
| a 10th of the time of the slowest participants. This is
| the origin of the 10x developer idea.
|
| You, on the other hand, are claiming that 10x engineers
| are 10 times more productive than the worst engineers.
| Completing a programming challenge in a 10th of the time
| is not the same as being 10 times more productive, and
| obviously your usage can't be an explanation, even as one
| you made up on the spot, as the math doesn't add up.
| JimDabell wrote:
| That was designed as a repeatable experiment, which seems
| entirely reasonable when you want to conduct a study. Why
| are you characterising that as "a programming contest"?
| That seems like an uncharitably distorted way of
| describing a study.
|
| That study also does not exist in isolation:
|
| https://www.construx.com/blog/the-origins-of-10x-how-
| valid-i...
| randomdata wrote:
| _> Why are you characterising that as "a programming
| contest"? _
|
| Because it was? Do you have a better way to repeatedly
| test _performance_? And yes, the study 's intent was to
| look at _performance_ , not productivity. It's even right
| in the title. Not sure where you dreamed up the latter.
| randomdata wrote:
| I'm not sure your math works.
|
| What we do know is that the worst engineers provide
| negative productivity. If 1x is the worst engineer, then
| let's for the sake of discussion denote x as -1 in order
| for the product to be negative. Except that means the 10x
| engineer provides -10 productivity, actually making them
| the worst engineer. Therein lies a conflict.
|
| What we also know is that best engineer has positive
| productivity, so that means the multiplicand must always be
| positive. Which means that it is the multiplier that must
| go negative, meaning that a -1x and maybe even a -10x
| engineer exists.
| JimDabell wrote:
| You are arguing against the idea that there is a factor
| of ten difference in productivity between the best and
| the worst engineers. That's fine if you want to do that,
| but that's explicitly where the term "10x engineer" comes
| from and what defines its meaning. So if you disagree
| with the underlying concept, there is no way for you to
| use terms like "[n]x engineer" coherently since you
| disagree with its most fundamental premise. You certainly
| shouldn't reinvent different meanings for these terms.
| moritzwarhier wrote:
| Thank you. This sounds so trivial at first, but your
| reductio ad absurdum at the beginning of your comment
| really nails it.
|
| Throw into the mix the fact that productivity is hard to
| measure as soon as more than one person works on
| something and that doesn't even begin to consider the
| economical aspects of software.
|
| And even when ignoring this point, there's that pesky
| short-term vs long-term thing.
|
| Also, how do you define the term "productivity"? I was
| assuming that you mean somethint along the lines of
| (indirect, if employed) monetary output.
| margalabargala wrote:
| You're not _wrong_ , but I think you may be treating
| something as literal math, when it is in fact idiomatic
| labels used to express trends.
| randomdata wrote:
| The problem here is the introduction of productivity.
|
| The 10x developer originated from a study that measured
| _performance_. The 10x developer being able to do a task
| in a 10th of the time is quite conceivable and reflects
| what the study found. I 'm sure we've all seen a
| developer take 10 hours to do a job that would take
| another developer just 1 hour. Nobody is doing it in
| negative hours, so the math works.
|
| But performance is not the same as productivity.
| hattmall wrote:
| Hmm, I never thought of it that way. I just heard 10x
| employees and fit it to what I knew. Which is that 90% of
| the work is accomplished by about 10% of workers. The other
| 90% really only get 10% done. So most developers are
| somewhere on a scale of 0.1 - 1. With 1 being a totally
| competent and good developer. The 10x people are just
| different though, it's like a pro-athlete to a regular
| player. It's not unique to software development, though it
| may stand out and be sought after more. I've noticed it in
| pretty much every industry. Some people are just able to
| achieve flow state in their work and be vastly more
| productive than others, be it writing code or laying sod. I
| don't find that there's a lot of in between 1 and 10
| though.
| SkyBelow wrote:
| Even if this was the origin of the term, it still doesn't
| make sense because the best engineers can solve problems
| the worst would never be able to do so. The difference
| between the best and worst is much more than 10x the worst.
| Maybe the worst who meets certain minimums at a company,
| but then the best would also be limited by those willing to
| work for what the company pays, and I hypothesis that the
| minimums of the lower bound and the maximums of the upper
| bound are correlated.
| JimDabell wrote:
| It sounds like you disagree with the concept of a 10x
| engineer then. In which case you should avoid using the
| term, rather than making up a new definition.
| robocat wrote:
| Concepts and words change meaning and sometimes we all
| need to accept that the popular meaning is not the
| definition we use.
|
| This is especially common when dealing with historical or
| academic definitions versus common modern usage.
| "Evolution" particularly annoys me.
|
| You should avoid using the term, rather than using a
| definition at odds with common usage. Your usage is
| confusing - and that is why you are getting push-back.
|
| The definition you have given is nonsensical - it can't
| be consistent over time or between companies because it
| depends on finding a minimum in a group. And a value that
| is strongly dependent on the worst developer is useless
| because it mostly measures how bad the worst developer is
| - it doesn't say anything about how good the best
| developer is.
| tnel77 wrote:
| It depends on the day if I feel like a 2x or a 0.1x engineer.
| Keep at it. You are not alone!
| loeg wrote:
| Spend less time on HN and you might get more done.
| tomsthumb wrote:
| Do you want to read hacker news or be hacker news?
| Xeyz0r wrote:
| You took the words right out of my mouth
| AlienRobot wrote:
| Do 10x engineers get 10x the wages? Somehow I feel being
| exceptionally better than other engineers is just unfair to
| both of you and the ones worse than you. I wouldn't want to
| be a 10x either, I'd rather just be normal engineer.
| tantaman wrote:
| Meta compensates 10x types very well. 3x bonus multipliers,
| additional equity that can range from 100k-1m+, and level
| increases are a huge bump to comp (https://www.levels.fyi/)
| chinchilla2020 wrote:
| I have many meta colleagues I've worked with in the past.
| All of them are well compensated but none of them were
| outstanding, or 10x.
| ponector wrote:
| Once you have few years of experience, you don't need to be
| 10x to have success. You can be a reliable 1.3x, a little bit
| better then your teammates.
|
| In the end it doesn't matter, whole team could be laid off at
| once.
| hyperthesis wrote:
| I think getting something worthwhile done is a better focus
| (actually quite hard!), and naturally increases your
| productivity as a side-effect.
|
| Productivity has no inherent value - like efficiency and
| perfection, it is necessarily of something else. Its value is
| entirely derived.
| didgetmaster wrote:
| It seems like the industry would get a lot more 10x behavior if
| it was recognized and rewarded more often than it currently
| does. Too often, management will focus more on the guy who
| works 12 hour days to accomplish 8 hours of real work than the
| guy who gets the same thing accomplished in an 8 hour day.
| Also, deviations from 'normal' are frowned upon. Taking time to
| improve the process isn't built into the schedule; so taking
| time to build a wheelbarrow is discouraged when they think you
| could be hauling buckets faster instead.
| Terretta wrote:
| It's almost impossible to get executives to think in return
| on equity ("RoE") for the future instead of "costs" measured
| in dollars and cents last quarter.
|
| Which is weird, since so many executives are working in a VC-
| funded environment, and internal work should be "venture
| funded" as well.
| happytiger wrote:
| That's because most executives can't understand technology
| deeply enough to know the difference.
| didgetmaster wrote:
| Even when they are smart enough to know, they seem to have
| very short memories. While I don't consider myself to be a
| 10x engineer; I have certainly done a number of 10x things
| over my career.
|
| I worked for a company where I almost single handedly built
| a product that resulted in tens of millions of dollars in
| sales. I got a nice 'atta boy' for it, but my future ideas
| were often overridden by someone in management who 'knew
| better'. After the management changed, I found myself in a
| downsizing event once I started criticizing them for a lack
| of innovation.
| KuriousCat wrote:
| This is the sad part of it, many people without core
| competence end up in "leadership" positions and remove
| any "perceived" threats to their authority. I believe
| part of it is due to the absence of leadership training
| in the engineering curriculum. Colleges should encourage
| engineers to take up few leadership courses and get them
| trained on things like Influence and Power.
| sangnoir wrote:
| >It seems like the industry would get a lot more 10x behavior
| if it was recognized and rewarded more often than it
| currently does
|
| I'd be happier if industry cares more for team productivity -
| I have witnessed how rewarding "10x" individuals may lead to
| perverse results on a wider scale, a la Cobra Effect. In one
| insidious case, our management-enabled, long-tenured "10x"
| rockstar fixed all the big customer-facing bugs quickly, but
| would create multiple smaller bugs and regressions for the 1x
| developers to fix while he moved to the next big problem
| worthy of his attention. Everyone else ended up being 0.7x -
| which made the curse of an engineer look even more productive
| comparatively!
|
| Because he was allowed to break the rules, there was a
| growing portion of the codebase that only he could work on -
| while it wasn't Rust, imagine an org has a "No Unsafe Rust"
| rule that is optional to 1 guy. Organizations ought to be
| _very_ careful how they measure productivity, and should
| certainly look beyond first-order metrics.
| lifeisstillgood wrote:
| I try to look at these things through the lens of "software
| literacy" - software is a form of literacy and this story
| might be better viewed as "a bunch of illiterate managers
| are impressed with one good writer at the encyclopdia
| publishers, now it turns out this guy makes mistakes, but
| hey, what do you expect when the management cannot read or
| write !"
| SomeCallMeTim wrote:
| This reminds me of the "Parable of the Two Programmers." [1]
| A story about what happens to a brilliant developer given an
| identical task to a mediocre developer.
|
| [1] I preserved a copy of it on my (no-advertising or
| monetization) blog here:
| https://realmensch.org/2017/08/25/the-parable-of-the-two-
| pro...
| mjevans wrote:
| I can't seem to find it in a google search, maybe I'm just
| recalling entirely the wrong terms.
|
| In the early computing era there was a competition.
| Something like take some input and produce an output. One
| programmer made a large program in (IIRC) Fortran with
| complex specifications documentation etc. The other used
| shell pipes, sort, and a small handful or two of other
| programs in a pipeline to accomplish the same task in like
| 10 developer min.
| ianmcgowan wrote:
| Sounds like "Knuth vs McIlroy", which has been discussed
| on hn and elsewhere before, and the general take is that
| it was somewhat unfair to Knuth.
|
| [1] https://homepages.cwi.nl/~storm/teaching/reader/Bentl
| eyEtAl8... [2]
| https://www.google.com/search?q=knuth+vs+mcilroy
| ramses0 wrote:
| The Knuth link in the sibling comment is an original, but
| you're probably thinking of "The Tao of Programming"
|
| http://catb.org/~esr/writings/unix-koans/ten-
| thousand.html
|
| """"And who better understands the Unix-nature?" Master
| Foo asked. "Is it he who writes the ten thousand lines,
| or he who, perceiving the emptiness of the task, gains
| merit by not coding?""""
| SomeCallMeTim wrote:
| I was both of those developers at different times, at
| least metaphorically.
|
| I drank from the OO koolaid at one point. I was really
| into building things up using OOD and creating
| extensible, flexible code to accomplish everything.
|
| And when I showed some code I'd written to my brother, he
| (rightly) scoffed and said that should have been 2-3
| lines of shell script.
|
| And I was enlightened. ;)
|
| Like, I seriously rebuilt my programming philosophy
| practically from the ground up after that one comment.
| It's cool having a really smart brother, even if he's
| younger than me. :)
| _a_a_a_ wrote:
| Without more backup I can only describe that as being
| fiction. Righteous fiction, where the good guy gets
| downtrodden and the bad guy wins to fuel the reader's
| resentment.
| 6510 wrote:
| To me it is a story about managers clueless about the
| work. You can make all the effort in the world to imagine
| doing something but the taste of the soup is in the
| eating. I do very simple physical grunt work for a
| living, there it is much more obvious that it is
| impossible. It's truly hilarious.
|
| They probably deserve more praise when they do guess
| correctly but would anyone really know when it happens?
| SomeCallMeTim wrote:
| It's practically my life experience.
|
| Sometimes I'm appreciated, and managers actually realize
| what they have when I create something for them.
| Frequently I accomplish borderline miracles and a manager
| will look at me and say, "OK, what about this other
| thing?"
|
| My first job out of college, I was working for a company
| run by a guy who said to me, "Programmers are a dime a
| dozen."
|
| He also said to me, after I quit, after his client
| refused to give him any more work unless he guaranteed
| that I was the lead developer on it, "I can't believe you
| quit." I simply shrugged and thought, "Maybe you
| shouldn't have treated me like crap, including not even
| matching the other offer I got."
|
| I've also made quite a lot of money "Rescuing Small
| Companies From Code Disasters. (TM)" ;) Yes, that's my
| catch phrase. So I've seen the messes that teams often
| create.
|
| The "incompetent" team code description in the story is
| practically prescient. I've seen the results of exactly
| that kind of management and team a dozen times. Things
| that, given the same project description, I could have
| created in 1/100 the code and with much more overall
| flexibility. I've literally thrown out entire projects
| like that and replaced them with the much smaller,
| tighter, and faster code that does more than the original
| project.
|
| So all I can say is: Find better teams to work with if
| you think this is fiction. This resonates with me because
| it contains industry Truth.
| 6510 wrote:
| I had an idea once but when I tried to explain it people
| didn't understand.
|
| I revisited earlier thought: communication is a 2 man job,
| one is to not make an effort to understand while the other
| explains things poorly. It always manages to never work
| out.
|
| Periodically I thought about the puzzle and was eventually
| able to explain it such that people thought it was
| brilliant ~ tho much to complex to execute.
|
| I thought about it some more, years went by and I
| eventually managed to make it easy to understand. The
| response: "If it was that simple someone else would have
| thought of it." I still find it hilarious decades later.
|
| It pops to mind often when I rewrite some code and it goes
| from almost unreadable to something simple and elegant. Ah,
| this must be how someone else would have done it!
| drekipus wrote:
| > Ah, this must be how someone else would have done it!
|
| This is a good exclamation :D
|
| And it's a poignant story. Thanks for sharing.
| lifeisstillgood wrote:
| That's pretty good. It needs an Athena poster :-)
| HenryBemis wrote:
| "Give me six hours to chop down a tree and I will spend the
| first four sharpening the axe."
|
| -- Abraham Lincoln
|
| I have started to follow this 'lately' (for a decade) and
| it has worked miracles. As for the anxious
| managers/clients, I keep them updated of the
| design/documentation/though process, mentioning the risks
| of the path-not-taken, and that maintain their peace of
| mind. But this depends heavily on the client and the
| managers.
| ransom1538 wrote:
| Honestly? You work at a place a manager hasn't heard "impact"
| yet? I thought managers at this point just walk around the
| office saying "impact".
| PH95VuimJjqBqy wrote:
| > It seems like the industry would get a lot more 10x
| behavior if it was recognized and rewarded more often than it
| currently does.
|
| I don't agree with that, there are a _lot_ of completely crap
| developers and they get put into positions where even the
| ones capable of doing so aren't allowed to because it's not
| on a ticket.
|
| I've seen some thing.
| throwitaway222 wrote:
| No one reading this during the hours of 9-5 is a 10x.
| randomdata wrote:
| Or is. If a 1x puts in an 8 hour day, a 10x only has to put
| in a 48 minute day. That leaves plenty of time to read this.
| simmerup wrote:
| That's a bad take because you're assuming that developer is
| capable of replicating that * 10
| adra wrote:
| That's entirely the fundamental flaw of the Nx developer
| ethos to a tee. No individual will benchmark reliably
| against any other person of their same trade/craft
| perfectly over time. The mythical BS times developer is
| so over simplified to be a meaningless concept. Hire
| "unicorn" and get amazing results just isn't a guarantee.
| They just probably have better chance than average to
| make a higher impact, which is good enough for companies
| that are willing to pay Nx times average salaries to
| acquire them.
| sebastianz wrote:
| His point is that smart and productive people are generally
| hard working, focused and diligent, which is how they get
| to be so experienced and productive.
|
| Hence not wasting time on social networks.
|
| > a 10x only has to put in a 48 minute day
|
| Nobody would call this person "10x".
| randomdata wrote:
| _> His point is that smart and productive people are
| generally hard working, focused and diligent_
|
| I don't think that tracks. Smart, productive, hard
| working people don't work 9-5. They work every hour they
| can, breaking only when they have pushed themselves to
| the limit. The limit can be hit at any hour. There is no
| magical property of the universe that gives people
| unlimited stamina during the hours of 9-5.
|
| _> Nobody would call this person "10x"._
|
| I'm not sure they would call anyone that, to be fair. A
| "10x developer" who also puts in 8 hours alongside the 1x
| developers isn't a 10x developer, he would be called a
| _sucker_.
| mewpmewp2 wrote:
| Hackernews is hardly a waste of time though. 10x is
| probably curious of topics mentioned on Hackernews.
| adra wrote:
| I know it's meant to be funny, but the number of tech people
| who spend zero time learning about "what's out there", are
| usually not the most effective developers. You won't find
| better solutions to existing or even new problems without an
| interest in industry. Maybe this particular article isn't
| "industry valuable fair enough", but having zero interest in
| refining and enhancing your craft beyond the work in front of
| you is almost guaranteed to end with worse outcomes.
| eszed wrote:
| Hard agree.
|
| Another flaw in his thinking: brain cycles and sub-
| conscious processing.
|
| I'm in the middle of a hard problem right now. I ran out of
| ideas, and opened HN about half an hour ago. In that time,
| without "trying", I've had two new ideas - one sent me back
| to my notes, which revealed that my original thinking was
| flawed; the second sent me to documentation, which
| suggested a new route to pursue. I'm digesting the
| implications of that while I write this.
|
| Beating my head against the problem directly for thirty
| minutes would have been less productive. (Though if I
| wasn't WFH I would have, and also been miserable, and
| learned less about the industry than I have from this
| thread. So there's that.)
|
| I'm far from a 10x _anything_ , but I don't have the only
| brain which works this way.
| andrei_says_ wrote:
| On my team, one of the main multipliers is understanding the
| need behind the requested implementation, and proposing
| alternative solutions - minimizing or avoiding code changes
| altogether. It helps that we work on internal tooling and are
| very close to the process and stakeholders.
|
| "Hmmm, there's another way to accomplish this" being the 10x.
| Doing things faster is not it.
| switch007 wrote:
| Exactly this. It's why it's so frustrating when product
| managers who think they're above giving background run the
| show (the ones who think they're your manager and are
| therefore too important to share that with you)
| mettamage wrote:
| When I was in college, I've met a few people that coded _a lot_
| faster than me. Typically, they started since they were 12
| instead of 21 (like me). That's how 10x engineers exist, by the
| time they are 30, they have roughly 20 years of programming
| experience behind their belt instead of 10.
|
| Also, their professional experience is much greater. Sure,
| their initial jobs at 15 are the occassional weird gig for the
| uncle/aunt or cousin/nephew but they get picked up by
| professional firms at 18 and do a job next to their CS studies.
|
| At least, that's how it used to be. Not sure if this is still
| happening due to the new job environment, but this was the
| reality from around 2004 to 2018.
|
| For 10x engineers to exist, all it takes is a few examples. To
| me, everyone is in agreement that they seem to be rare. I point
| to a public 10x engineer. He'd never say it himself, but my
| guess is that this person is a 10x engineer [1].
|
| If you disagree, I'm curious how you'd disagree. I'm just a
| blind man touching a part of the elephant [2]. I do not claim
| to see the whole picture.
|
| [1] https://bellard.org/ (the person who created JSLinux)
|
| [2] https://en.wikipedia.org/wiki/Blind_men_and_an_elephant -
| if you don't know the parable, it's a fun one!
| QuercusMax wrote:
| Yup, that's been my experience as someone who asked for a C++
| compiler for my 12th birthday, worked on a bunch of random
| websites and webapps for friends of the family, and spent
| some time at age 16-17 running a Beowulf cluster and
| attempting to help postdocs port their code to run on MPI
| (with mixed success). All thru my CS education I was writing
| tons of toy programs, contributing (as much as I could)
| toward OSS, reading lots of stuff on best practices, and
| leaning on my much older (12 years) brother who was working
| in the industry. He pointed me to Java and IntelliJ, told me
| to read Design Patterns (Gang of Four) and Refactoring
| (Fowler). I read Joel on Software religiously, even though he
| was a Microsoft guy and I was a hardcore Linux-head.
|
| By the time I joined my first real company at age 21, I was
| ready to start putting a lot of this stuff into place. I
| joined a small med device software company which had a great
| product but really no strong software engineering culture:
| zero unit tests, using CVS with no branches, release builds
| were done manually on the COO's workstation, etc.
|
| As literally the most junior person in the company I worked
| through all these things and convinced my much more senior
| colleagues that we should start using release branches
| instead of "hey everybody, please don't check in any new code
| until we get this release out the door". I wrote automated
| build scripts mostly for my own benefit, until the COO
| realized that he didn't have to worry about keeping a dev
| environment on his machine, now that he didn't code any more.
| I wrote a junit-inspired unit testing framework for the
| language we were using
| (https://en.wikipedia.org/wiki/IDL_(programming_language) -
| like Matlab but weirder).
|
| Without my work as a "10x junior engineer", the company would
| have been unable to scale to more than 3 or 4 developers. I
| got involved in hiring and made sure we were hiring people
| who were on board with writing tests. We finally turned into
| a "real" software company 2 or 3 years after I joined.
| mettamage wrote:
| This sounds similar to the best programmer I personally
| know and he was an intern working at LLVM at the time. It's
| funny how companies treat that part of his life as "no
| experience". Then suddenly he goes into the HFT space and
| within a couple of years he has a similar rank that people
| have that are twice his age.
|
| 10x engineers exist. To be fair, it does depend which
| software engineer you see as "the standard software
| engineer", but if I take myself as a standard (as an
| employed software engineer with 5 years of experience),
| then 10x software engineers exist.
| nlavezzo wrote:
| Nick with Antithesis here with a funny story on this.
|
| I became friends with Dave our CTO when I was 5 or 6, we were
| neighbors. He'd already started coding little games in Basic
| (this was 1985). Later in our friendship, like when I was
| maybe 10, I asked him if he could help me learn to code,
| which he did. After a week or two I had made some progress
| but compared what I could do to what he was doing and figured
| "I guess I just started too late, what's the point?".
|
| I found out later that most people didn't start coding till
| late HS or college! It worked out though - I'm programmer
| adjacent and have taken care of the business side of our
| projects through the years :)
| theamk wrote:
| Last year, we had 2 new hires.. one is fresh out of college
| (and not one of the top ones), other with 15 years experience
| on resume in our industry.
|
| I am not sure there is 10x difference, but there is at least
| 5x difference in performance, in favor of fresh college grad,
| and they are now working on the more complex tasks too.
|
| The sad part is our hiring is still heavily in "senior
| engineer with lots of experience" phase, and intership
| program has been canceled.
| jjjjj55555 wrote:
| Some people organize their time and focus their efforts more
| efficiently than others. They also use tools that others
| might not even know or careabout.
|
| You probably surf the internet 10x faster than your parents.
| Yes you've probably had more exposure than them, but you
| could probably teach them how to do it just as fast. But
| would they want to learn and would they actually adapt what
| you taught them?
| joantune wrote:
| With motivation, repetition, and those depend on how
| plastic your brain is, thus the age, yes!
| SomeCallMeTim wrote:
| Yes: Programmers who start at twelve are often the 10x
| programmers who can really program faster than the average
| developer by a lot.
|
| No: It's not because they have 10 more years of experience.
| Read "The Mythical Man Month." That's the book that
| popularized the concept that some developers were 5-25x
| faster than others. One of the takeaways was that the speed
| of a developer was not correlated with experience. At all.
|
| That said, the kind of person who can learn programming at 12
| might just be the kind of person who is really good at
| programming.
|
| I started learning programming concepts at 11-12. I'm not the
| best programmer I know, but when I started out in the
| industry at 22 I was working with developers with 10+ years
| of (real) experience on me...and I was able to come in and
| improve on their code to an extreme degree. I was completing
| my projects faster than other senior developers. With less
| than two years of experience in the industry I was promoted
| to "senior" developer and put on a project as lead (and sole)
| developer and my project was the only one to be completed on
| time, and with no defects. (This is video game industry, so
| it wasn't exactly a super-simple project; at the time this
| meant games written 100% in assembly language with all kinds
| of memory and performance constraints, and a single bug meant
| Nintendo would reject the image and make you fix the problem.
| We got our cartridge approved the first time through.)
|
| Some programmers are just faster and more intuitive with
| programming than others. This shouldn't be a surprise. Some
| writers are better and faster than others. Some artists are
| better and faster than others. Some architects are better and
| faster than others. Some product designers are better and
| faster than others. It's not _all_ about the number of hours
| of practice in any of these cases; yes, the best in a field
| often practices an insane amount. But the very top in each
| field, despite having similar numbers of hours of practice
| and experience, can vary in skill by an insane amount. Even
| some of the best in each field are vastly different in speed:
| You can have an artist who takes years to paint a single
| painting, and another who does several per week, but of
| similar ultimate quality. Humans have different aptitudes.
| This shouldn 't even be controversial.
|
| I do wonder if the "learned programming at 12" has anything
| to do with it: Most people will only ever be able to speak a
| language as fluently as a native speaker if they learn it
| before they're about 13-14 years old. After that the brain
| (again, for most people; this isn't universal) apparently
| becomes less flexible. In MRI studies they can actually
| detect differences between the parts of the brain used to
| learn a foreign language as an adult vs. as a tween or early
| teen. So there's a chance that early exposure to the right
| concepts actually reshapes the brain. But that's just
| conjecture mixed with my intuition of the situation: When I
| observe "normal" developers program, it really feels like I'm
| a native speaker and they're trying to convert between an
| alien way of thinking about a problem into a foreign language
| they're not that familiar with.
|
| AND...there may not be a need to explicitly PROGRAM before
| you're 15 to be good at it as an adult. There are video games
| that exercise similar brain regions that could substitute for
| actual programming experience. AND I may be 100% wrong. Would
| be good for someone to fund some studies.
| eschneider wrote:
| I'm not even sure that coding _much_ faster than necessary is
| even required to give a 3-5x multiple on "average", let alone
| "worst case" developers. Some of the biggest productivity
| wins can be had by being able to look at requirements,
| knowing what's right or wrong about them, and getting
| everyone on the same page so the thing only needs to be made
| once. Being good at test and debug so problems are identified
| and fixed _early_ are also big wins. Lots of that is just
| having the experience to recognize what sort of problem
| you're dealing with very quickly.
|
| Being a programming prodigy is nice, but I don't think you
| even really need that.
| joantune wrote:
| Underrated comment
| confidantlake wrote:
| I am not convinced that just starting early is all there is
| to it. I started Math, Sports, and Piano at like 6 years old
| but there are still plenty of "10x <insert activity here>"
| people that figuratively and literally run circles around me.
| Talent is a real thing.
| joantune wrote:
| The intensity you did it though matters. You probably
| didn't spend that many years on a specific sport for
| instance.
|
| And when we're talking about sports, genetics matter as
| well (depending on each one)
|
| When we're talking brains, while genetics also matter,
| assuming normal (whatever that is) brain, the plasticity
| changes a lot how it operates.
|
| So, the 10 years thing is definitely a big if not the
| biggest part. In my opinion. Would love to see studies if
| any exist out there on this
| VoodooJuJu wrote:
| 10x developer is just a buzzword people throw around when
| they're trying to sell you something.
| xnx wrote:
| Anyone can be a 10x engineer when they write something
| similar/identical to what they've written before. Other jobs
| are not like this. A plumber may only be 20% faster on the best
| days of their career.
| ManuelKiessling wrote:
| What makes a car go fast? The brakes:
|
| https://manuel.kiessling.net/2011/04/07/why-developing-witho...
| BobbyTables2 wrote:
| > People who implement something very few people even
| considered or understood to be possible, which then gives
| amazing leverage to deliver working software in a fraction of
| the time.
|
| I agree with the first part of your statement, but what really
| happens to such people?
|
| In my experience (sample size greater than one), they receive
| some kudos, but remain underpaid, never promoted, and are given
| more work under tight deadlines. At least until some of them
| are laid off along with lower performers.
|
| But for those who say that hard things are impossible, they
| seem to get along just fine. They merely declare such things as
| out-of-scope or lie about their roadmap.
| bedobi wrote:
| > In my experience (sample size greater than one), they
| receive some kudos, but remain underpaid, never promoted, and
| are given more work under tight deadlines. At least until
| some of them are laid off along with lower performers.
|
| 100% agree, I've seen plenty of the best of the best get
| treated like trash and laid off at first sight of trouble on
| the horizon
| hyperthesis wrote:
| I've always thought a x10 is one who sits back and sees a
| simpler way - like some math problems have an easy solution, if
| you can see it. Also: change the question; change the context
| (Alan Kay)
|
| (And absolutely not brute-force grinding themselves away)
| strangattractor wrote:
| 6.5 X 15 is only 97 hours per week not even close to the 400
| hrs (5X40) per week of programming a 10X Rust programmer can
| provide. I jest but all this 10X stuff is getting ridiculous.
| They stayed in "Stealth" mode because they didn't have anything
| worth showing for 5 years. Doesn't sound all that productive to
| me. More likely what they are trying to do was hard and
| complicated and took a while to figure out.
| joantune wrote:
| They're not boasting about their current productivity,
| they're boasting about the one they achieved at FoundationDB
| when they implemented the testing, which gave them the idea
| to build antithesis
| devjab wrote:
| In my experience it often comes down to business processes. We
| have a guy in my extended team who knows everything about his
| side of the company. When I work with him I accomplish business
| altering deliveries in a very short amount of time, which after
| a week or two rarely needs to be touched again unless something
| in the business changes. He's not a PO and we don't do anything
| too formally because it's just him, me and another developer +
| whatever business manager will benefit from the development
| (and a few testers from their tran). In many ways the way we
| work these projects are very akin to Team Topologies.
|
| At other times I'll be assigned projects with regular POs,
| Architects and business employees who barely know what it is
| they are doing themselves, with poorly defined tasks and all
| sorts of bureaucratic nonsense "agile" process methods and well
| spend forever delivering nothing.
|
| So sometimes I'm a 50x developer delivering business altering
| changes. At other times I'm a useless cog in a sea of pseudo
| workers. I don't particularly care, I get paid, but if
| management actually knew what was going on, and how to change
| it... well...
| agentultra wrote:
| I got similar productivity boosts after learning TLA+ and Alloy.
|
| Simulation is an interesting approach but I am curious if they
| ever implemented the simulation wrong would it report errors that
| don't happen on the target platform or fail to find errors that
| the target platform reports? How wide the gap is will matter...
| and how many possible platforms and configurations will the
| hypervisor cover?
| ComputerGuru wrote:
| I was mentally hijacked into clicking the jobs link (despite
| recently deciding I wasn't going to go down that rabbit hole
| again!) but fortunately/unfortunately it is in-person and daily
| so, so flying out from Chicago a week out of the month won't work
| and I don't even have to ask!
|
| More to the point of the story (though I do think the actual
| point was indeed a hiring or contracting pitch), this reminds me
| a lot of the internal tests the SQLite team has. I would love to
| hear from someone with access to those if they feel the same way.
| laiysb wrote:
| > I was mentally hijacked into clicking the jobs link (despite
| recently deciding I wasn't going to go down that rabbit hole
| again!) but fortunately/unfortunately it is in-person and daily
| so, so flying out from Chicago a week out of the month won't
| work and I don't even have to ask!
|
| given their PLTR connection, probably not
| ComputerGuru wrote:
| Oh, suddenly I'm not interested, either! Thanks!
| timwis wrote:
| Gosh, I know it's a bit late, but I wish they'd called the
| product _The Prime Radiant_
|
| Fans of Asimov's _Foundation_ series will appreciate the analogue
| to how this system aims to predict every eventuality based on
| every possible combination of events, a la psychohistory.
|
| P.S. amazing intro post. Can't wait to try the product.
| rvnx wrote:
| It would be the opposite of the product:
|
| For a software not interacting with the real world, there is
| only one possibility for frame N+1, if you know the state of a
| system.
|
| https://en.wikipedia.org/wiki/Determinism
|
| PRNG are illusions, just misunderstood by humans.
| samatman wrote:
| Did you intend to link to
| https://en.wikipedia.org/wiki/Deterministic_algorithm?
| timwis wrote:
| Feels like I may have brought a spoon to a gun fight, but I
| would have considered psychohistory to be the ultimate
| extrapolation of determinism, and the fact that the prime
| radiant is able to predict _which_ version of events will
| happen is because it (somehow) knows the state of the system.
|
| Of course, to argue against myself, it would surely be based
| on layers of probabilities, and they say several times in the
| series that it can't predict low-level specific things, just
| high-level things. And perhaps the whole underlying question
| posed by the series is whether the universe really is
| deterministic. But anyway I don't think it's all off-base.
| amw-zero wrote:
| I'm trying to avoid diving into the hype cycle about this
| immediately - but this sounds like the holy grail right? Use your
| existing application as-is (assuming it's containerized), and
| simply check properties on it?
|
| The blocker in doing that has always been the foundations of our
| machines: non-deterministic CPUs and operating systems. Re-
| building an entire vertical computing stack is practically
| impossible, so they just _avoid_ it by building a high-fidelity
| deterministic simulator.
|
| I do wonder how they are checking for equivalence between the
| simulator and existing OS's, as that sounds like a non-trivial
| task. But, even still, I'm really bought in to this idea.
| wilkystyle wrote:
| Does it even _need_ to be containerized? According to the post,
| it sounds like Antithesis is a solution at the hypervisor
| layer.
| amw-zero wrote:
| Yes it looks like containerization is required: https://antit
| hesis.com/docs/getting_started/setup.html#conta...
| voidmain wrote:
| Containers are doing two jobs for us: they give our
| customers a convenient way to send us software to run, and
| they give us a convenient place to simulate the network
| boundary between different machines in a distributed
| system. The whole guest operating system running the
| containers is also running inside the deterministic
| hypervisor and under test (and it's mostly just NixOS
| Linux, not something weird that we wrote).
|
| I'm a co-founder of Antithesis.
| tikhonj wrote:
| Oh, cool to hear you're using NixOS. The Nix philosophy
| totally gels with the philosophy described in the post.
|
| But it's also probably fair to describe NixOS as
| something weird that somebody else wrote :)
| xyzelement wrote:
| I appreciated this post. Separately from what they are talking
| about, I found this bit insightful:
|
| // This limits the value of testing, because if you had the
| foresight to write a test for a particular case, then you
| probably had the foresight to make the code handle that case too.
|
| I often felt this way when I saw developers feel a sense of doing
| good work and creating safe software because they wrote unit
| tests like expect add(2,2) = 4. There is basically a 1-1
| correlation between cases you thought to test and that you coded
| for, which is really no better off in terms of unexplored
| scenarios.
|
| I get that this has some incremental value in catching blatant
| miscoding and regressions down the road so it's helpful, it's
| just not getting at the _main_ thing that will kill you.
|
| I felt similarly about human QA back in my finance days that
| asked developers for a test plan. If the dev writes a test plan,
| it also only covers what the dev already thought about. So I
| asked my team to write the vaguest/highest level test plan
| possible (eg, "it should now be possible to trade a Singaporean
| bond" rather than "type the Singaporean bond ticker into the
| field, type the amount, type the yield, click buy or sell") - the
| vagueness made more room for the QA person to do something
| different (even things like tabbing vs clicking, or filling the
| fields out of sequence, or misreading the labels) than how the
| dev saw it, which is the whole point.
| jwr wrote:
| FoundationDB is an impressive achievement, quite possibly the
| only distributed database out there that lives up to its strict
| serializability claims (see
| https://jepsen.io/consistency/models/strict-serializable for a
| good definition). The way they wrote it is indeed very
| interesting and a tool that does this for other systems is
| immediately worth looking at.
| candiddevmike wrote:
| > quite possibly the only distributed database out there that
| lives up to its strict serializability claims
|
| Jepsen has never tested FoundationDB, not sure why you claim
| this and link to Jepsen's site.
| nlavezzo wrote:
| FDB co-founder here.
|
| Aphyr / Jepsen never tested FDB because, as he tweeted "their
| testing appears to be waaaay more rigorous than mine." We
| actually put a screen cap of that tweet in the blog post
| linked here.
| krisoft wrote:
| > not sure why you claim this and link to Jepsen's site.
|
| They link to the website for a definition of the term they
| are using.
| mcmoor wrote:
| Is it that good? I've been tasked to deploy it for sometime and
| it always bit me in the ass for one reason or another. And I'm
| not the one who use it so I don't know if it's actually good.
| For now I much prefer redis.
| foobiekr wrote:
| It's great, but operationally there are lots of gotchas and
| little guidance.
|
| We got bitten _hard_ in production when we accidentally
| allowed some of the nodes to get above 90% of the storage
| used. The whole database collapsed into a state where it
| could only do a few transactions a second. Then the ops team,
| thinking they were clever, doubled the size of the cluster in
| order to give it the resources it needed to get the average
| utilization down to 45%; this was an unforced error as that
| pushed the size of the cluster outside the fdb comfort zone
| (120 nodes) which is itself a problem. The deed was done
| though and pulling nodes was not possible in this state, so
| slowly, slooooowly... things got fixed.
|
| We ended up spending an entire weekend slowly, slowly getting
| things back into a good place. We did not lose data, but
| basically prod was down for the duration, and we found it
| necessary to _manually_ evict the full nodes one at a time
| over the period.
|
| Now, this was a few years ago, and fdb has performed wickedly
| fast, with utter, total reliability before that and since,
| and to this day the ops team is butthurt about fdb.
|
| From an engineering perspective, if you aren't using java fdb
| is pretty not great, since the very limited number of
| abstraction layers that exist are all java-centric. There are
| many, many issues with the maximum transaction time thing,
| the maximum key size and value size and total transaction
| size issue, the lack of pushdown predicates (e.g., filtered
| scans can't be done in-place which means that in AWS, they
| cost a lot in inter-az network charge terms and also are
| gated by the network performance of your instances), and so
| on.
|
| What ALL of these have issues have in common is that they
| bite you late in the game. The storage issue bites you when
| you're hitting the DB hard in production and have a big data
| set, the lack of abstractions means that even something as
| finding leaked junked keys turns out to be impossible unless
| you were diligent to manually frame all your values so you
| could identify things as more than just bytes, the
| transaction time thing is very weird to deal with as you tend
| to have creeping crud aspects and the lack of libraries that
| instrument the transactions to give you early warning is an
| issue, likewise for certain kinds of key-value pairs, there's
| a creeping size problem - hey, this value is an index of
| other values; if you're not very careful up front, you _will_
| eventually hit either the txn size limit or the key limit.
| The usual workarounds for those is to do separate
| transactions - a staging transaction, then essentially a swap
| operation and then a garbage collection transaction - but
| that has lots of issues overtime when coupled with
| application failure.
|
| There are answers to ALL of these, manual ones. For the
| popular languages other than java - Go, python, maybe Ruby -
| there _should_ be answers for them, but there aren't. These
| are very sharp edges. Those java layers are _also_ _not_
| _bug_ _free_. So yeah, one has a reliable storage storage
| layer (a topic that has come up over and over again in the
| last few years) but it's the layer on top of that where all
| the bugs are, but now with constraints and factors that are
| harder to reason about than the usual storage layer.
|
| One might say, hey, SQL has all of these problems too, except
| no. You can bump into transaction limits, but the limits are
| vastly higher than fdb and the transaction time sluggishness
| will identify it long before you run into the "your
| transaction is rejected, spin retrying something that will
| _never_ recover" sort of issue that your average developer
| will eventually encounter in fdb.
|
| That said, I love fdb as a software achievement. I just wish
| they had finished it. For my current project, I have designed
| it out. I might be able to avoid all of the sharp edges above
| at this point, but since we are not a java shop, I also can't
| rely on all the engineers to even know they exist.
| jwr wrote:
| It depends how you define "good". I care mostly about my
| distributed database being correct, living up to its
| consistency claims, and providing strict serializability.
|
| (see also https://aphyr.com/posts/283-jepsen-redis)
|
| I care much less about how easy it is to use or deploy, but
| "good" is a subjective term, so other people might see things
| differently.
| aduffy wrote:
| Looks like this coincides with seed funding[1], congrats folks!
| Did you guys just bootstrap through the last 5 years of
| development?
|
| [1] https://www.saltwire.com/cape-breton/business/code-
| testing-s...
| samsquire wrote:
| This is really exciting.
|
| I am an absolute beginner at TLA+ but I really like this possible
| design space.
|
| I have an idea for a package manager that combines type system
| with this style of deterministic testing and state space
| exploration.
|
| Imagine knowing that your invocation of package-
| manager install <tool name>
|
| Will always work because file system and OS state are part of the
| deterministic model.
|
| or an next gen Helm with type system and state space exploration
| is tested: kubectl apply <yaml>
|
| will always work when it comes up because all configuration state
| space exploration has been tested thanks to types.
| __MatrixMan__ wrote:
| Coincidence, I'm reading this and thinking about test harnesses
| for my package manager idea, which is really just a thin
| wrapper around nix, designed under the assumption that the
| network might partition at any moment: keep the data nearest
| where it's needed, refer by hash not by name, gossip metadata
| necessary to find the hash for you, no single points of
| failure.
|
| Tell me more about yours?
| samsquire wrote:
| I am thinking about state machine progressions and TLA+ style
| specifications which are invariants over a progression of
| variables.
|
| Your package manager knows your operating system's current
| state and the state space of all the control flow graph
| through the program and configuration together can go to, it
| can verify that everything lines up and there will be no
| error when executed a bit like a compiler but without causing
| the Halting problem.
|
| In TLA+ you can dump a state graph as a dot file, which I
| turn into a SVG and run with TLA+ graph visualiser.
|
| Types verify possible control flow is valid at every point.
| We just need to add types to the operating system and file
| system and represent state space for deterministic
| verification.
|
| You could hide packages that won't work.
|
| The package manager would have to lookup precached state
| spaces or download them as part of the verification process.
| traspler wrote:
| Checking their bug report which should contain "detailed
| information about a particular bug" I am not sure I can fully
| understand those claims:
| https://public.antithesis.com/report/ZsfkRkU58VYYW1yRVF8zsvU...
|
| To my untrained eye I get: Logs, a graph of when in time the bug
| happened over multiple runs and a statistical analysis which part
| of the application code could be invovled. The statistical
| analysis is nice but it is completely flat, without any
| hierarchical relationships making it quite hard to parse
| mentally.
|
| I kind of expected more context to be provided about inputs,
| steps and systems that lead to the bug. Is it expected to then
| start adding all the logging/debugging that might be missing from
| the logs and re-run it to track it down? I hoped that given the
| deterministic systems and inputs there could be more initial
| hints provided.
| Invictus0 wrote:
| Talk about bad writing. If I don't know what the hell your thing
| is in the first paragraph, I'm not going to read your whole blog
| post to find out. Homepage is just as bad.
| tranceylc wrote:
| The article is more of a history lesson and context than it is
| an ad. I see what you mean, but clicking "product -> What Is
| Antithesis?" Shows a clear description of what it does. Perhaps
| that could also either be added to the article or the home
| page?
| agumonkey wrote:
| interesting, this kind of responsive environment is dear but rare
|
| i can't recall the last time i went to a place and people even
| considered investing in such setups
|
| i assume that except for hard problems and teams seeking
| challenges, most people will revert to the mean and refuse any
| kind of infrastructure work because it's mentally more
| comfortable piling features and fixing bugs later
|
| ps: i wish there was a meetup of teams like this, or even job
| boards :)
| nlavezzo wrote:
| We'll be starting some meetups, attending conferences, etc.
| this year. Also hop into our Discord if you want to chat, lots
| of us are in there regularly. discord.gg/antithesis
| agumonkey wrote:
| oh, that's cool, thanks
| shermantanktop wrote:
| I kept cringing when I read the words "no bugs."
|
| This is hubris in the classic style - it's asking for a literal
| thunderbolt from the heavens.
|
| It may be true, but...come on.
|
| Everyone who has ever written a program has thought they were
| done only to find one more bug. It's the fundamental experience
| of programming to asymptotically approach zero bugs but never
| actually get there.
|
| Again, perhaps the claim is true but it goes against my instincts
| to entertain the possibility.
| rkangel wrote:
| I think there is something interesting about the fact that
| someone writing "no bugs" makes us all uncomfortable.
|
| If they really did have a complex product, running in
| production from a sizeable userbase and had 2 bug reports ever,
| then I think it's a reasonable thing to say.
|
| The fact that it _isn 't_ a reasonable thing to say for the
| most other software is a little sad.
| shermantanktop wrote:
| Right, the claim may be true, but I have a visceral reaction
| to it. And tbh I'd be hesitant to work with someone who made
| a zero-bugs claim about their own work.
| sfink wrote:
| Yeah, same. It suggests that you must be employing one of the
| time-honored approaches to getting zero bugs:
|
| * Redefine all bugs as features
|
| * Redefine "bug" to conveniently only apply to the things your
| system prevents
|
| * Don't write software
|
| This reminds me of bugzilla's "Zarro Boogs" phrase that
| pointedly avoids saying "Zero Bugs" because it's such a
| deceptive term, see https://en.wikipedia.org/wiki/Bugzilla
|
| Being able to say "no bugs" with justifiable confidence, even
| when restricting it to some class of bugs, is truly a great and
| significant thing. cf Rust. But claiming to have no bugs is
| cringeworthy.
| chinchilla2020 wrote:
| It's a marketing blog post, not a technical post. Something
| about the whole thing feels icky.
| sackfield wrote:
| "At FoundationDB, once we hit the point of having ~zero bugs and
| confidence that any new ones would be found immediately, we
| entered into this blessed condition and we flew. Programming in
| this state is like living life surrounded by a force field that
| protects you from all harm. Suddenly, you feel like you can take
| risks"
|
| When this state hits it really is a thing to behold. Its very
| empowering to trust your system to this extent, and to know if
| you introduce a bug a test will save you.
| dkyc wrote:
| On mobile, the "Let's talk" button in the top right corner is cut
| off by the carousel menu overlay. Seems like CSS is still out of
| scope of the bug fixing magic for now.
|
| On a more serious note, it's an interesting blog post, but it
| comes off as veeery confident about what is clearly an incredibly
| broad and complex topic. Curious to see how it will work in
| production.
| wruza wrote:
| Yeah, if only there was some scientific way to ensure that
| elements don't overlap, let's call it "constraints" maybe, so
| one could test layouts by simply solving, idk... something like
| a set of linear equations? Hope some day CSS will stop being
| "aweso"me and become nothing in favor of a useful layout
| system.
| wwilson wrote:
| Aww... crap, you're right. I knew we should have finished the
| UI testing product and run it on ourselves before launching.
|
| Disclosure: Antithesis co-founder.
| terpimost wrote:
| Designer here, sorry, it is intentional. I thought horizontally
| scrollable menu is more straightforward than full screen
| expander.
| thomastraum wrote:
| https://antithesis.com/images/people/will.jpg the look of the CEO
| is selling the software to me automatically. reliable and nice
| islandert wrote:
| There's a straightforward way to reach this testing state for
| optimization problems. Write 2 implementations of the code, one
| that is simple/slow and one that is optimized. Generate random
| inputs and assert outputs match correctly.
|
| I've used this for leetcode-style problems and have never failed
| on correctness.
|
| It is liberating to code in systems that test like this for the
| exact reasons mentioned in the article.
| mrkeen wrote:
| Non-overlapping problem spaces.
|
| Leet-code ends in unit-testing land, this product begins in
| system-testing land.
| Qwuke wrote:
| I met Antithesis at Strangeloop this year and got to talk to
| employees about the state of the art of automated fault injection
| that I was following when I worked at Amazon, and I cannot
| overstate how their product is a huge leap forward compared to
| many of the formal verification systems being used today.
|
| I actually got to follow their bug tracking process on an issue
| they identified in Apache Spark streaming - going off of the
| docs, they managed to identify a subtle and insidious correctness
| error in a common operation that would've caused headaches in low
| visibility edge case for years at that point. In the end the docs
| were incorrect, but after that showing I cannot imagine how
| critical tools like Antithesis will be inside companies building
| distributed systems.
|
| I hope we get some blog posts that dig into the technical weeds
| soon, I'd love to hear what brought them to their current
| approach.
| BoppreH wrote:
| Three thoughts:
|
| 1. It's a brilliant idea that came at the right time. It feels
| like people are finally losing patience with flaky software, see
| developer sentiment on: fuzzers, static typing, memory safety,
| standardized protocols, containers, etc.
|
| 2. It's meant to be niche. $2 per hour per CPU (or $7000 per year
| per CPU if reserved), no free tier for hobby or FOSS, and the
| only way to try/buy is to contact them. Ouch. It's a valid
| business model, I'm just sad it's not going for maximum positive
| impact.
|
| 3. Kudos for the high quality writing and documentation, and I
| absolutely love that the docs include things like (emphasis in
| original):
|
| > If a bug is found in production, or by your customers, _you
| should demand an explanation from us_.
|
| That's exactly how you buy developer goodwill. Reminds me of
| Mullvad, who I still recommend to people even after they dropped
| the ball on me.
| wwilson wrote:
| Thanks for your kind words! As I mention in this comment
| (https://news.ycombinator.com/item?id=39358526) we are planning
| to have pricing suitable for small teams, and perhaps even a
| free tier for FOSS, in the future.
|
| Disclosure: Antithesis co-founder.
| eatonphil wrote:
| There a few FOSS projects I'd love to set this up for if you
| ever get to the free tier. :)
| jerf wrote:
| "It's meant to be niche. $2 per hour per CPU (or $7000 per year
| per CPU if reserved), no free tier for hobby or FOSS, and the
| only way to try/buy is to contact them. Ouch. It's a valid
| business model, I'm just sad it's not going for maximum
| positive impact."
|
| This is the sort of thing that, if it takes off, will start
| affecting the entire software world. Hardware will start adding
| features to support it. In 30 years this may simply be how
| computing works. But the pioneers need to recover the costs of
| the arrows they got stuck with before it can really spread out.
| Don't look at this an event, but as the beginning of a process.
| whatshisface wrote:
| $2 per hour per CPU could be expensive or inexpensive,
| depending on how long it takes to fuzz your program. I wonder
| how that multiplies out in real use cases?
| benrutter wrote:
| This is a great pitch, and I don't want to come across as
| negative, but I feel like a statement like "we found all bugs"
| can only be true with a very narrow definition of bug.
|
| The most pernicious, hard-to-find bugs that I've come across have
| all been around the business logic of an application, rather than
| it hitting into an error state. I'm thinking of the category
| where you have something like "a database is currently reporting
| a completed transaction against a customer, but no completed
| purchase item, how should it be displayed on the customer _recent
| transactions_ page? ". Implementing something where "a thing will
| appear and not crash" in those cases is one thing, but making
| sure that it actually makes sense as a choice given all the
| context of everyone elses choices everywhere else in the stack is
| a lot harder.
|
| Or to take a database, something along the lines of "our query
| planner produces a really suboptimal plan in this edge-case".
|
| Neither of those types of problems could ever be automatically
| detected, because they aren't issues of the programming reaching
| an error state- the issue is figuring out in the first place what
| "correct" actually is for you application.
|
| Maybe I'm setting the bar too high for what a "bug" is, but I
| guess my point is, its one thing to fantasize about having zero
| bugs, its another to build software in the real world. I probably
| still settle for 0 run time errors though to be fair. . .
| moritonal wrote:
| Good summary of the hard part of being a software developer
| that deals with clients.
| Aachen wrote:
| What software developer does not deal with clients (and makes
| a living)?
| ejb999 wrote:
| lots of software developers never deal with clients
| (clients as in the people who will actually use the
| software) - most of them in fact, in any of the big
| companies I have worked for anyway...and that is probably
| not a good thing.
|
| I myself, prefer to work with the people who will actually
| use what I build - get a better product hat way.
| adamauckland wrote:
| I consider a "bug" to be "it was supposed to do something and
| failed".
|
| Issues around business logic are not failures of the system,
| the system worked to spec, the spec was not comprehensive
| enough and now we iterate.
| repelsteeltje wrote:
| ...And now we could probably start debating your narrow
| definition of "system". ;-)
| pipo234 wrote:
| Most of the software I've built doesn't have "a spec.", but
| let me zoom in on specs. around streaming media. MPEG DASH,
| CMAF or even the base media file format (ISO/IEC 14496-12)
| at times can be pretty vague. In practice, this frequently
| turns up in actual interoperability issues where it's
| pretty difficult to point out which of two products is
| according to spec and which one has a bug.
|
| So yes, I totally agree with GP and would actually go
| further: a phrase like "we found all the bugs in the
| database" is nonsense and makes the article less credible.
| Aachen wrote:
| What do you call it when the spec is wrong? Like clearly
| actually wrong, such as when someone copied a paragraph from
| one CRUD-describing page to the next and forgot to change the
| word "thing1" to "thing2" in the delete description.
|
| Because I'd call that a bug. A spec bug, but a bug. It's no
| feature request to make the code based on the newer page
| delete thing2 rather than thing1, it's fixing a defect
| pinkmuffinere wrote:
| Ya, I would like a word for this as well. I naturally refer
| to this category of error as bug, but this occasionally
| leads to significant conflict with others at work. I now
| default to calling _almost everything_ a feature request,
| which is obviously dumb but less likely to get me into
| trouble. If there is a better word for "it does exactly
| what we planned, but what we planned was wrong" I would
| love to adopt it.
| Aachen wrote:
| I reported such a bug to some software my company uses
| (Tempo). Vendor proceeds to call it a feature request
| because the software _successfully fails_ to show public
| information (visible in the UI, but HTTP 403 in the API
| unless you 're an admin).
|
| Instead of changing one word in the code that defines the
| access level required for this GET call, it gets triaged
| as not being a bug, put on a backlog, and we never heard
| from it again obviously
|
| We pay for this shit
| SilasX wrote:
| There's the distinction between correctness and fitness for
| purpose which I think is helpful for clarifying the issues
| here.
|
| Correctness bug: it didn't do what the spec says it should
| do.
|
| Fitness for purpose bug: it does what the spec says to do,
| but, with better knowledge, the spec isn't what you
| actually want.
|
| Edit: looks like this maps, respectively, to failing
| verification and failing validation.
| https://news.ycombinator.com/item?id=39359673
|
| Edit2: My earlier comment on the different things that get
| called "bugs", before I was aware of this terminology:
| https://news.ycombinator.com/item?id=22259973
| rkangel wrote:
| Systems Engineering has terminology for this distinction.
|
| Verification is "does this thing do what I asked it to do".
|
| Validation is "did I ask it to do the right thing".
| crashabr wrote:
| Tangentially related, but I've recently started
| distinguishing verification and validation in my data
| cleaning work:
|
| verification refers to "is this dataset clean?" or the more
| precise "does this dataset confirm my assumptions about
| what a what a correct dataset should be given its focus"
|
| validation refers to "can it answer my questions?" or the
| more rigorous "can I test my hypotheses against this
| dataset?"
|
| So I find this interesting (but in hindsight unsurprising)
| that similar definitions are used in other fields. Would
| you have a source for your defintions?
| rkangel wrote:
| They're fairly standard terms from "old style" project
| management - they show up in the usual V Model of
| Waterfall vein.
|
| E.g. see Wikipedia: https://en.m.wikipedia.org/wiki/Verif
| ication_and_validation
| zestyping wrote:
| A spec bug is just as bad as a code bug! Declaring a system
| free of defects because it matches the spec is sneaky
| sleight-of-hand that ignores the costs of having a spec.
|
| The actual testing value is the difference between the cost
| of writing and maintaining the code, and the cost of writing
| and maintaining the spec.
|
| If the spec is similar in complexity to the code itself, then
| bugs in the spec are just as likely as bugs in the code, thus
| verification to spec has gained you nothing (and probably
| cost you a lot).
| amw-zero wrote:
| I do think that it was a mistake to use the word "all" and
| imply that there are absolutely no bugs in FoundationDB.
| However, FoundationDB is truly known as having advanced the
| state of the art for testing practices:
| https://apple.github.io/foundationdb/testing.html.
|
| So in normal cases this would reek of someone being arrogant /
| overconfident, but here they really have gotten very close to
| zero bugs.
| spinningD20 wrote:
| The other issue I would point out is that building a
| database, while impressive with their quality, is still
| fundamentally different than an application or set of
| applications like a larger SaaS offering would involve (api,
| web, mobile, etc). Like the difference between API and UI
| test strategies, where API has much more clearly defined and
| standardized inputs and outputs.
|
| To be clear, I am not saying that you can't define all inputs
| and outputs of a "complete SaaS product offering stack",
| because you likely could, though if it's already been built
| by someone that doesn't have these things in mind, then it's
| a different problem space to find bugs.
|
| As someone who has spent the last 15 years championing
| quality strategy for companies and training folks of varying
| roles on how to properly assess risk, it does indeed feel
| like this has a more narrow scope of "bug" as a definition,
| in the sort of way that a developer could try to claim that
| robust unit tests would catch "any" bugs, or even most of
| them. The types of risk to a software's quality have larger
| surface areas than at that level.
| amw-zero wrote:
| There's a lot of assertions that I throw into business
| applications that would be very useful to test in this way.
| So I don't think this only applies to testing databases.
|
| Also, when properties are difficult to think of, that often
| means that a model of the behavior might be more
| appropriate to test against, e.g.
| https://concerningquality.com/model-based-testing/. It
| would take a bit of design work to get this to play nicely
| with the Antithesis approach, but it's definitely doable.
| spinningD20 wrote:
| Just to clarify, I am definitely not saying this is only
| useful or only applies to databases.
|
| The point was more that, I don't see how this testing
| approach (at the level that it functions) would catch all
| of the bugs that I have seen in my career, and so to say
| "all of the bugs" or even "most of the bugs" is
| definitely a stretch.
|
| This is certainly useful, just like unit tests,
| assertions, etc are all very useful. It's just not the
| whole picture of "bugs".
| amw-zero wrote:
| Yes, there are plenty of non-functional logic bugs, e.g.
| performance issues. I think this starts to drastically
| hone in on the set of "all" bugs though, especially by
| doing things like network fault injection by default.
| This will trigger complex interactions between
| dependencies that are likely almost never tested.
|
| They should clarify that this is focused on functional
| logic bugs though, I agree with that.
| nlavezzo wrote:
| I think the reference to "all the bugs" here is basically that
| our insanely brutal deterministic testing system was not
| finding any more bugs after 100's of thousands of runs. Can't
| prove a negative obviously, but the fact that we'd gotten to
| that "all green" status gave us a ton of confidence to push
| forward in feature development, believing we were building on
| something solid - which, time has shown we were.
| dap wrote:
| Thanks -- that's very clarifying! But isn't this circular?
| The lack of bugs is used as evidence of the effectiveness of
| the testing approach, but the testing approach is validated
| by...not finding any more bugs in the software?
| FridgeSeal wrote:
| Yeah but if your software is running in an environment that
| controls for a lot of non-determinism _and_ can simulate
| various kinds of failures and degradations at varying
| rates, and do it all in accelerated time and your software
| is still working correctly; I think it'd be somewhat
| reasonable to assert that maybe the testing setup has done
| a pretty good job.
| dap wrote:
| Agreed, the approach sounds very interesting and I can
| see how it could be very effective! I'd love to try it on
| my own stuff. That's why it's so surprising (to me) to
| claim that the approach found nearly every bug in
| something as complicated as a production distributed
| database. My career experience tells me (quite strongly)
| that can't possibly be true.
| dap wrote:
| The best definition I've heard for "bug" is "software not
| working as documented". Of course, a lot of software is lacking
| documentation -- and those are doc bugs. But I like this
| definition because even when the docs are incomplete, the
| definition guides you to ask: would I really document that the
| software behaves like this or would I change the behavior [and
| document that]? It's harder (at least for me) to sweep goofy
| behavior under the rug.
| oconnor663 wrote:
| To be fair, the line right after that is "I know, I know,
| that's an insane thing to say."
| pshc wrote:
| I feel like business logic bugs live on a separate layer, the
| application layer, and it's not fair to count those against the
| database itself.
|
| I agree that suboptimal query planning would be a database-
| layer bug, a defect which could easily be missed by the bug-
| testing framework.
| aftbit wrote:
| This "no bugs" maximalism is counterproductive. There are many
| classes of bugs that this cannot hope to handle. For example,
| let's say I have a transaction processing application that speaks
| to Stripe to handle the credit card flow. What happens if Stripe
| begins send a webhook showing that it rejected my transactions
| but report them as completed successfully when I poll them? The
| need to "delete all of our dependencies" (I presume they wrote
| their own OS kernel too?) in FoundationDB shows that upstream
| bugs will always sneak through this tooling.
| mempko wrote:
| In my career I learned two powerful tools to get bug free code.
| Design by Contract and Randomized testing.
|
| I had to roll this by myself for each project I did. Antithesis
| seems to systematize it and created great tooling around it.
| That's Great!!!
|
| However, looking at their docs they rely on assertion failures to
| find bugs. I believe Antithesis has a missed opportunity here by
| not properly pushing for Design by Contract instead of generic
| use of assertions. They don't even mention Design by Contract. I
| suspect the vast majority of people here on HN have never heard
| of it.
|
| They should create a Design by Contract SDK for languages that
| don't have one (think most languages) that interacts nicely with
| tooling and only fallback to generic assertions when their SDK is
| not available. A Design by Contract SDK would provide better
| error messages over generic assertions, further helping users
| solve bugs. In fact, their testing framework is useless without
| contracts being diligently used. It requires a different training
| and mindset from engineers. Teaching them Design by Contract puts
| them in that frame of mind.
|
| They have an opportunity to teach Design by Contract to a new
| generation of engineers. I'm surprised they don't even mention
| it.
| mrkeen wrote:
| I've never gotten anything more out of DbC than it being
| assertions and if-statements, but described using fancy
| English. I even worked with the creator of C4J a few years ago.
| mempko wrote:
| The primary benefit imo is
|
| * Way of thinking and discipline. Instead of adhock
| assertions, you deliberately state in code "These are the
| preconditions, invariants, and postconditions" of this
| function/module
|
| * Better error messages.
|
| * Better documentation (can automate extracting the contracts
| as documentation).
|
| * Better tooling. Can automate creating tests from
| preconditions. You can sample the functions input space and
| make sure invariants and postconditions hold.
|
| It's like, do you name all your functions 'func a1, func a2,
| func a3' or do you provide better names?
| 0xbadcafebee wrote:
| > The biggest effect was that it gave our tiny engineering team
| the productivity of a team 50x its size.
|
| 49 years ago, a man named Fred Brooks published a book, wherein
| he postulated that adding people to a late software project makes
| it later. It's staggering that 49 years later, people are still
| discovering that having a larger engineering team does not make
| your work more productive (or better). So what does make work
| more productive?
|
| Productivity requires efficiency. Efficiency is expensive,
| complicated, nuanced, curt. You can't just start out from day 1
| with an efficient team or company. It has to be grown,
| intentionally, continuously, like a garden of fragile flowers in
| a harsh environment.
|
| Is the soil's pH right? Good. Is it getting enough sun? Good.
| Wait, is that leaf a little yellow? Might need to shade it. Hmm,
| are we watering it too much? Let's change some things and see.
| Ok, doing better now. Ah, it's growing fast now. Let's trim some
| of those lower leaves. Hmm, it's looking a little tall, is it
| growing too fast? Maybe it does need more sun after all.
|
| If you really pay attention, and continue to make changes towards
| the goal of efficiency, you'll get there. No need for a 10x
| developer or 3 billion dollars. You just have to listen, look,
| change, measure, repeat. Eventually you'll feel the magic of
| zooming along productively. But you have to keep your eye on it
| until it blooms. And then keep it blooming...
| Communitivity wrote:
| There are situations where no bugs is an important requirement,
| if it means no bugs that cause a noticeable failure. Things such
| as planes, submarines, nuclear reactors. For those there is
| provably correct code. That takes a long time to write, and I
| mean a really long time. Applying that to all software doesn't
| make sense from a commercial perspective. There are areas where
| improvements can have a big impact though, such as language
| safety improvements (Rust) and cybersecurity requirements
| regarding private data protection. I see those as being the
| biggest win.
|
| I don't see no bugs in a distributed database as important enough
| to delay shipping for 5 years, but (a) it's not my baby; (b) I
| don't know what industries/use cases they are targeting. For me
| it's much more important to ship something with no critical bugs
| early, get user feedback, iterate, then rinse and repeat
| continually.
| amw-zero wrote:
| This is a false dichotomy though. The proposed approach here
| has a (theoretically) great cost to value ratio. Spending time
| on a workload generation process, and adding some asserts to
| your code is much lower cost than hand-writing tens of
| thousands of test cases.
|
| So it's not that this approach is only useful for critical
| applications, it's that it's low-cost enough to potentially
| speed up "regular" business application testing.
| 0xbadcafebee wrote:
| A lot of people underestimate the power of QA. Yeah, it would
| be great if we could just perfectly engineer something out of
| the gate. But you can also just take several months to stare at
| something, poke at it, jiggle it, and fix every conceivable
| problem, before shipping it. Heresy in the software world, but
| in every other part of the world it's called quality.
| mrkeen wrote:
| > I don't see no bugs in a distributed database as important
| enough to delay shipping for 5 years
|
| The marketplace has enough distributed databases with bugs.
| There's a nice catalogue of them at jepsen.io.
|
| > For me it's much more important to ship something with no
| critical bugs early, get user feedback, iterate, then rinse and
| repeat continually.
|
| * You can't really choose which bugs are critical if you're
| selling a database. A lost write is as critical as the customer
| deems it is.
|
| * You're not limited to your own users' feedback. There's
| plenty of users out there who disapprove of a buggy database,
| so you can probably take their views onboard before release.
| fleaflicker wrote:
| Business value is a good way to think about it:
|
| > As a software developer, fixing bugs is a good thing. Right?
| Isn't it always a good thing?
|
| > No!
|
| > Fixing bugs is only important when the value of having the bug
| fixed exceeds the cost of the fixing it.
|
| https://www.joelonsoftware.com/2001/07/31/hard-assed-bug-fix...
| coolThingsFirst wrote:
| Why are all the cool people working on DBs and talking about
| Paxos?
| A-Dmonkey wrote:
| one of the best applications yet of AI in cyber
| mprime1 wrote:
| Great read. Great product. I've been an early user of Antithesis.
| My background is dependability and formal distributed systems.
|
| This thing is magic (or rather, it's indistinguishable from magic
| ;-)).
|
| If they told me I could test any distributed system without a
| single line of code change, do things like step-by-step
| debugging, even rollback time at will, I would not believe it.
| But Antithesis works as advertised.
|
| It's a game-changer for distributed systems that truly care about
| dependability.
| chrsw wrote:
| Could this work for embedded C projects? Bare metal or RTOS?
| karatekidd32v wrote:
| Not directly related to this post, but clicking around the
| webpage I chuckled seeing Palantir's case study/testimonial:
|
| https://antithesis.com/solutions/who_we_help
| norir wrote:
| I think there is a lot of opportunity for integrating simulation
| into software development. I'm surprised it isn't more common
| though I suppose the upfront investment would scare many away.
| kendallgclark wrote:
| Happy customer here ---- maybe the first or second? Distributed
| systems are hard; #iykyk.
|
| Antithesis makes them less hard (not in line an NP hard sense but
| still!).
| binarymax wrote:
| I got really excited about this, and I spent a little time
| looking through the documentation, but I can't figure out how
| this is different than randomizing unit tests? It seems if I have
| a unit test suite already, then that's 99% of the work? Am I
| misunderstanding? I am drawing my conclusions from reading the
| Getting Started series of the docs, especially the Workloads
| section:
| https://antithesis.com/docs/getting_started/workload.html
| jakewins wrote:
| This is that, and the exact same vibe, except: it promises to
| keep being that simple even after you add threads, and locks,
| and network calls, and disk accesses and..
|
| With this, if you write a test for a function that makes a
| network call and writes the result to disk, your test will fail
| if your code does not handle the network call failing or
| stalling indefinitely, or the disk running out of space, or the
| power going out just before you close the file, or..
|
| So it's; yes, but it expands the space where testing is as easy
| as unit testing to cover much more interesting levels of
| complexity
| nlavezzo wrote:
| Antithesis here - curious what part of the Getting Started doc
| gave you that impression? If you take a look at our How
| Antithesis Works page, it might help answer you question as to
| how Antithesis is different from just bundling your unit tests.
|
| https://antithesis.com/docs/introduction/how_antithesis_work...
|
| In short though, unit tests can help to inform a workload, but
| we don't require them. We autonomously explore software system
| execution paths by introducing different inputs, faults, etc.,
| which discovers behaviors that may have been unforeseen by
| anyone writing unit tests.
| binarymax wrote:
| Thanks for the response. The linked introduction does help.
| The workload page does give me that impression (and based on
| upvotes of my post it does to others as well)...so perhaps
| disambiguating that the void test*() examples on the
| workloads page are not unit tests might help!
|
| Congrats on the launch and I'll consider using it for some of
| my projects.
| iamnotsure wrote:
| "I love me a powerful type system, but it's not the same as
| actually running your software in thousands and thousands of
| crazy situations you'd never dreamed of."
|
| Would not trust. Formal software verification is badly needed.
| Running thousands of tests means almost nothing in software
| world. Don't fool beginners with your test hero stories.
| mrkeen wrote:
| Great, but formal software verification is not yet broadly
| applicable to most day-to-day app development.
|
| Good type systems (a pretty decent chunk of formal software
| dev) are absolutely necessary and available.
|
| But things get tricky moving past that.
|
| I've tried out TLA+/PlusCal, and one or more things usually
| happen:
|
| 1) The state space blows up and there's simply too much to
| simulate, so you can't run your proof.
|
| 2) With regard to race-detection, you yourself have to choose
| which sections of code are atomic, and which can be
| interleaved. Huge effort, source of errors, and fills the TLA
| file with noise.
|
| 3) Anything you want to run/simulate needs an implementation in
| TLA+. By necessity it's a cut-down version, or 'model'. But
| even when I'm happy to pretend all-of-Kafka is just a single
| linkedlist, there's still _so much (bug-inviting) coding_ to
| model your critical logic in terms of your linked list.
|
| Ironically, TLA+ is not itself typed ( _deliberately_!). In a
| toy traffic light example, I once proved that cars and
| pedestrians wouldn 't be given "greenLight" at the same time.
| Instead, the cars had "greenLight" and the pedestrians had
| "green"!
| sfink wrote:
| That'll work great for your Distributed QSort Incorporated
| startup, where the only product is a sorting algorithm.
|
| Formal software verification is very useful. But what can be
| usefully formalized is rather limited, and what can be
| formalized correctly in practice is even more limited. That
| means you need to restrict your scope to something sane and
| useful. As a result, in the real world running thousands of
| tests is practically useful. (Well, it depends on what those
| tests are; it's easy to write 1000s of tests that either test
| the same thing, or only test the things that will pass and not
| the things that would fail.) They are _especially_ useful if
| running in a mode where the unexpected happens often, as it
| sounds like this system can do. (It 's reminiscent of rr's
| chaos mode -- https://rr-project.org/ linking to
| https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mo...
| )
| pfdietz wrote:
| Formal verification require a formal statement of what the
| software is supposed to do.
|
| But if you have that, you have a recipe for doing property
| based testing: generate inputs that satisfy the conditions
| specified in this formal description, then verify that the
| behavior satisfies the specification also.
|
| And then run for millions and millions of inputs.
|
| Is it _really_ going to be worth proving the program correct,
| when you could just run an endless series of tests? Especially
| if the verifier takes forever solving NP hard theorem proving
| problems at every check in. Use that compute time to just run
| the tests.
| mamidon wrote:
| I can see how determinism can be achieved (not easy, but
| possible), and I can see how describing a few important system
| invariants can match 100's or 1000's of hand rolled tests, but
| I'm having a hard time understanding how it's possible to
| intelligently explore the inputs to generate.
|
| e.g. if I wrote a compiler, how would Antithesis generate mostly
| valid source code for it? Simply fuzzing utf8 inputs wouldn't get
| very far.
| pfdietz wrote:
| I don't know how they'd do compiler testing, but I know how I
| do it (testing Common Lisp), and can talk about that if you're
| interested.
|
| But it would be cool to hear how they'd do it.
| chinchilla2020 wrote:
| The blog post has some impressive copy but is lacking details
| on how you implement their product.
|
| I am highly skeptical of any claims that something 'magically
| just works' without much configuration or setup.
| intuitionist wrote:
| (Disclosure: I'm an Antithesis employee.)
|
| The blog post is meant as a high-level introduction for a
| general audience. The documentation
| (https://antithesis.com/docs/) goes into considerably more
| detail about what kind of configuration and setup you need to
| start testing with Antithesis.
| Gehinnn wrote:
| Reading this article, I want the same now for js code that
| involves web-workers...
|
| How can I write code that involves a webworker in a way that I
| can simulate every possible CPU scheduling between the main
| thread in the webworker (given they communicate via post message
| and no shared array buffer)? Is it possible to write such brute
| force test in pure JS, without having to simulate the entire
| computer?
| mrkeen wrote:
| Use TLA+/PlusCal for this. It's what it's there for.
| gadders wrote:
| This sounds amazing, but I wonder how long it would take to set
| up for any reasonably complex system.
| zoogeny wrote:
| What is described in this post is the gold standard of software
| reliability testing. A world where all critical and foundational
| systems are tested to this level would be a massive step forward
| for technology in general.
|
| I'm skeptical of their claims but inspired by the vision. Even
| taking into account my skepticism, I would prefer to deploy
| systems tested to this standard over alternatives.
| mlsu wrote:
| Pricing doesn't make sense.
|
| What does a CPU hour mean for this framework? How many do I need?
| jewel wrote:
| This reminds me of Java Pathfinder, but for distributed systems.
| ijustlovemath wrote:
| We've done something similar for our medical device; totally
| deterministic simulations that cover all sorts of real world
| scenarios and help us improve our product. When you have
| determinism, you can make changes and just rerun the whole thing
| to make sure you actually addressed the problems you found.
|
| Another nice side effect is that if you hang on to the
| specification for the simulation, you only have to hang on to
| core metrics from the simulation, since the entire program state
| can be reproduced in a debugger by just using the same
| specification on the same code version.
| schaum wrote:
| that sounds like "automated advanced chaos monkey" to me
| https://en.wikipedia.org/wiki/Chaos_engineering#Chaos_Monkey
| nlavezzo wrote:
| Depends on how far you mean with "advanced" here. We
| specifically cover the differences between Antithesis and Chaos
| Engineering in our "How It's Different" page:
|
| https://antithesis.com/product/how_is_antithesis_different/
|
| Here's the relevant text though:
|
| Antithesis testing resembles chaos testing, in that it injects
| faults to trigger and identify problems. But Antithesis runs
| these tests in a fully deterministic simulated environment,
| rather than in production. This means Antithesis testing never
| risks real-world downtime. This in turn allows for much more
| aggressive fault injection, which finds more bugs, and finds
| them faster. Antithesis can also test new builds before they
| roll out to production, meaning you find the bugs before your
| customer does.
|
| Finally, Antithesis can perfectly reproduce any problem it
| finds, enabling quick debugging. While chaos testing can
| discover problems in production, it is then unable to replicate
| them, because the real world is not deterministic.
| bell-cot wrote:
| First reaction: "Yes, your site's weird font is bugging me!"
| intuitionist wrote:
| (Antithesis employee here.)
|
| We're using Inter, which our designer assures me is pretty
| popular. But this isn't the first time we've heard this
| complaint. Also, we had an issue in testing on a different part
| of our site where the font was getting computed as something
| weird and ugly on our development NixOS machines. Would you
| mind replying with what the browser console says your font is
| getting computed as? Thanks!
|
| Edit: The other person who mentioned this seems to think that
| it's caused by their JavaScript blocker--we're trying to figure
| out why, but in the meantime, enabling JS might help if you
| haven't.
| bell-cot wrote:
| (It's JS blocking - I told NoScript to allow antithesis.com,
| and that completely changed the font in FireFox.)
| zubairq wrote:
| I need to follow this example to build software faster
| ajb wrote:
| I'm sure I've heard of something similar being built, but
| specific to the JVM (ie, a specialised JVM that tests your code
| by choosing the most hostile thread switching points).
| Unfortunately that was mentioned to me at least 10 years ago, and
| I can't find it.
| lifeinthevoid wrote:
| I don't want to sound silly, but there are 24 open and 37 closed
| bugs on the FoundationDB Github page. Could it perhaps be that
| bug-free is somewhat exaggerated?
|
| Antithesis looks very promising by the way :-)
|
| Edit: perhaps Apple didn't continue the rigorous testing while
| evolving the FoundationDB codebase.
| jsdwarf wrote:
| > and found all of the bugs in the database
|
| This is when I stopped reading
| hintymad wrote:
| I really like antithesis' approach: it's non-intrusive as all the
| changes are on a VM so one can run deterministic simulation
| without changing their code. It's also technically challenging,
| as making a VM suitable for deterministic simulation is not an
| easy feat.
|
| On a side, I was wondering how this approach compares to Meta's
| Hermit(https://github.com/facebookexperimental/hermit), which is
| a deterministic Linux instead of a VM.
| stavros wrote:
| This is really impressive, but still, if you're working on a
| piece of software where this can work, count yourself lucky. Most
| software I've worked on (boring line-of-business stuff) would
| need as many lines of code to test a behavior as to implement the
| behavior.
|
| It's not very frequently that you have a behavior that's very
| hard to make correct, but very easy to check for correctness.
| shuntress wrote:
| > a platform that takes your software and hunts for bugs in it
|
| Ok but, what actually IS it?
|
| It seems like it is a cloud service that will run integration
| tests. I have to figure out how to deploy to this special
| environment and I still have to write those integration tests
| using special libraries.
|
| But even after all that integration refactoring, how is this
| supposed to help me find actual bugs that I wouldn't already have
| found in my own environment with my own integration tests?
| mrinterweb wrote:
| I came away with the same questions.
| mcapodici wrote:
| So this is a valgrind for containers? "If" it works well, and
| doesn't flag false things, this is pretty useful.
|
| You might want to sell this as a bug finder. But it also could be
| sold as a security hardening tool.
| georgelyon wrote:
| Congratulations to the Antithesis team!
|
| I actually interviewed with them when they were just starting,
| and outside of being very technically proficient, they are also a
| great group of folks. They flew my wife and I out to DC on what
| happened to be the coldest day of the year that year (we are from
| California) so we didn't end up following through but I'd like to
| think there is an alternative me out there in the multiverse
| hacking away on this stuff.
|
| I highly recommend Will's talks (which I believe he links in the
| blog post):
|
| https://m.youtube.com/watch?v=4fFDFbi3toc
|
| https://m.youtube.com/watch?v=fFSPwJFXVlw
| JonChesterfield wrote:
| > We thought about this and decided to just go all out and write
| a hypervisor which emulates a deterministic computer.
|
| Huh. Yes, that would work. It's in the category of obvious in
| hindsight. That is a very convincing sales pitch.
___________________________________________________________________
(page generated 2024-02-13 23:00 UTC)