[HN Gopher] Ask HN: Do You Test in Production?
___________________________________________________________________
Ask HN: Do You Test in Production?
There are a lot of blog posts talking about the fact that testing
in prod should not be a taboo like it may have been in the 90s.
I've read some of these [1] [2], I get the arguments in favour of
it, and I want to try some experiments. My question is -- how does
one go about doing it _safely_? In particular, I'm thinking about
data. Is it common practice to inject fabricated data into a prod
system to run such tests? What's the best practice or prior art on
doing this well? Ultimately, I think this will end up looking like
implementing SLIs and SLOs in PROD, but for some of my SLOs, I
think I need to actually _fake_ the data in order to get the SLIs I
need, so how to do this? Suggestions appreciated -- thanks. [1]
https://increment.com/testing/i-test-in-production/ [2]
https://segment.com/blog/we-test-in-production-you-should-too/
Author : bradwood
Score : 19 points
Date : 2023-01-14 22:13 UTC (46 minutes ago)
| tonymet wrote:
| yes you can do so with a canary tier . assuming your code is well
| instrumented to distinguish performance and quality regressions ,
| a canary tier served to customers will catch more regressions
| than synthetic testing
| rr808 wrote:
| Depends a lot on your application and how big the changes are. IF
| you're an online store and you're pushing out incremental changes
| to a subset of users its a good strategy. If its aircraft auto-
| pilot not so much.
| csours wrote:
| Everyone tests in production. Some people also test before
| production!
|
| Some people try to NOT test in production, but everyone does test
| in prod in a very real sense because dependencies and
| environments are different in prod.
|
| I think the question was "Do you INTENTIONALLY test in
| production"
| bradwood wrote:
| I see a lot of suggestions in the comments for feature flags --
| we've been using these from the beginning, to very good effect.
|
| However flags turn on/off _code_ , not data, and my main area of
| interest here is how to deal with the test data problem in prod.
| quickthrower2 wrote:
| In a multi tennant system one of the accounts can be a test
| account. Within that you can run integration tests. You might
| need special cases: test payment accounts and credit cards, test
| pricing plans and so on.
|
| Some basic ping tests and other checks before swapping (as in
| preparing, initiating, and pointing the load balancer) to a new
| version into production would be smart.
| HereBeBeasties wrote:
| Good testing is an exercise in pushing I/O to the fringes, as
| that's what has stateful side-effects. (Some might even argue
| that anything that tests I/O is an integration test. The term
| "integration test" is not well defined and not worth getting hung
| up over IME.)
|
| Once you're into testing I/O, which is ultimately unavoidable no
| matter how hard you try not to, you either need cooperative third
| parties who can give you truly representative test systems (rare)
| or a certain amount of test-in-prod.
|
| Testing database stuff remains hard. You either wrap things in a
| some kind of layer you can mock out, or dupe prod or some subset
| of it into a staging environment with a daily snapshot or similar
| and hope any differences (scale, normally) aren't too bad.
|
| Copy-on-write systems or those with time-travel and/or
| immutability help immensely with test-in-prod, especially if you
| can effectively branch your data. If it's your own systems you
| are testing against, things like lakefs.io look pretty useful in
| this regard.
| cloudking wrote:
| I think it depends on how your application works. If you have the
| concept of customers, then you can have a test customer in
| production with test data that doesn't affect real customers for
| example. You can reset the test customer data each time you want
| to test.
| brianwawok wrote:
| Anytime you need to talk to a third party API, you need to test
| in prod.
|
| Some people have sandbox apis. They are generally broken and not
| worth it. See eBay for super in depth sandbox API that never
| works.
|
| You can read the docs 100 times over. At the end of the day, the
| API is going to work like it works. So you kind of "have to" test
| in prod for these guys.
| lpapez wrote:
| Ditto regarding Paypal: you need a sandbox API token to get
| started with it. Their sandbox token generator was broken for
| MONTHS, I could not believe it. By the time we got the token,
| we already fixed all bugs on our side the hard way - by testing
| in prod - and moved on.
| __s wrote:
| Yes
|
| Just because you have staging doesn't mean you don't need unit
| tests. Similarly, test in stage, then test in prod. Ideally in a
| way isolated from real prod users (eg, in an insurance system we
| had fake dealer accounts for testing)
| fleekonpoint wrote:
| We run canaries in Prod, it isn't as extensive as our integration
| tests that run in our test stages but it still tests happy paths
| for most of our APIs.
| natoliniak wrote:
| Feature flags.
| atemerev wrote:
| In electronic trading, most new systems are tested in production
| by running with smaller capital allocation first. It is hard to
| flatten out all bugs unless you are on the real market with real
| money and real effects (of course, simulations testing and unit
| testing are heavily employed too).
| revskill wrote:
| It's more about handling production error quickly, than testing
| in production. Feature flag is a good way.
| paxys wrote:
| Lots of ways to test in production. IMO the way you are
| suggesting - injecting synthetic data into prod - is the worst of
| both worlds. You aren't actually testing real world use cases,
| and end up polluting your prod environment.
|
| Some common ways to go about this:
|
| - Feature flags: every new change goes into production behind a
| flag. You can flip the flag for a limited set of users and do a
| broader rollout when ready.
|
| - Staged rollouts: have staging/canary etc. environments and roll
| out new deployments to them first. Observe metrics and alerts to
| check if something is wrong.
|
| - Beta releases: have a group of internal/external power users
| test your features before they go out to the world.
| cuuupid wrote:
| I work for a B2E company that has a structure similar to
| Salesforce. We test in production all the time even for our
| secure environments where the data is highly sensitive.
|
| Re: data, it's a somewhat common practice to notionalize data
| (think isomorphically faking data). We regularly do this and will
| often designate rows as notional to hide them from users who
| aren't admins. I've found this to work exceptionally well; we do
| this 1-2 times a week, ensure there's a closed circuit for
| notional data, and for more critical systems we'll inform our
| customers that testing will occur.
|
| I'm sure there are more complex and automated solutions but when
| it comes to testing, simple and flexible is often the way to go.
| bradwood wrote:
| Thanks. This sounds interesting.
|
| Can you give a bit more colour on "notionalizing" and
| "isomorphically faking" please.
| cuuupid wrote:
| Essentially creating fake data that looks very realistic and
| creates narratives that would span real use cases. Some of
| this is simple (fake names with faker), some of it is a bit
| more manually guided (customer-specific terminology and
| specific business logic).
|
| The goal here is for the data to both be useful for testing
| and provide coverage not just at a software level, but at a
| user story level. This helps test things like cross-
| application interactions; is also doubly helpful since we can
| use it for demos without screwing up production data.
| piyh wrote:
| >notionalize data (think isomorphically faking data)
|
| Are these just $5 words for setting a fake data flag on the
| records?
| cuuupid wrote:
| We do that too but notionalizing for us is usually creating
| data that looks and behaves realistically but is actually
| fake. (A side benefit to this is that we can then use it for
| demos!)
| nonethewiser wrote:
| So you mock data and then flag it as fake.
| cuuupid wrote:
| Essentially yes! We usually try to follow some sort of
| theoretical user story/paint some sort of narrative but
| at the end of the day it's just adjusting the mocking.
|
| Just now realizing notionalizing isn't a widely accepted
| term for this
| sethammons wrote:
| Note: if you plan on accurate financial planning and metrics
| (esp. if going public), you need to be able to separate your test
| prod stats from the real prod stats for reporting.
| dmitriid wrote:
| A/B tests and feature flags are basically testing in prod. And
| yes, some of those features sometimes run as a "well, it should
| work, but we're not entirely sure until we get a significant
| number of users using the system". It could be an edge case
| failing or scalability requirements being wrong.
|
| Another variation on the same theme is rewriting systems when you
| run production data through both systems. Quite often that's the
| only way of doin migrations to a new platform, or a new database,
| or yes, a newly re-written system.
|
| > Is it common practice to inject fabricated data into a prod
| system to run such tests? What's the best practice or prior art
| on doing this well?
|
| A very common practice is to run a snapshot of prod data (e.g.
| last hour, or last 24 hours, or even a week/month/year) through a
| system in staging (or cooking, or pre-cooking, or whatever name
| you give the system that's just about to be released). However,
| doing it properly may not be easy, and depends on the systems
| involved.
| turtleyacht wrote:
| Sometimes one cannot get the exact same specs on test hardware
| versus production, yet a rollout depends on simulating system
| load to shake out issues.
|
| Performance testing needs a schedule, visibility, timebox, known
| scope, backout plan, data revert plan, pre- and post-graphs.
| Schedule. Folks are clearly tagged in a table
| with times down the side. Visibility. Folks who should
| know know when it's going to happen,
| are invited to the session, and are mentioned
| in the distributed schedule.
| Timebox. It's going to start at a defined
| time and end on a defined time.
| Known scope. Is it going to fulfill an order?
| How many accounts created? Backout
| plan. DBA and DevOps on standby for stopping
| the test. Data revert plan. We know what rows to
| delete or update after testing.
| Pretty pictures. You want to show graphs
| during the test, so that you know what to
| improve and everyone's time wasn't
| wasted.
|
| Reference: observing successful runs that didn't result in
| problems later.
___________________________________________________________________
(page generated 2023-01-14 23:00 UTC)