[HN Gopher] Addition with flamethrowers - why game devs don't un...
       ___________________________________________________________________
        
       Addition with flamethrowers - why game devs don't unit test
        
       Author : jgalecki
       Score  : 37 points
       Date   : 2024-05-24 11:25 UTC (1 days ago)
        
 (HTM) web link (www.pixelatedplaygrounds.com)
 (TXT) w3m dump (www.pixelatedplaygrounds.com)
        
       | shaftway wrote:
       | I don't buy this argument. Most game developers I know have said
       | that unit tests are a waste of time so they never use them, but
       | they're struggling with making changes to utility code and making
       | sure that it doesn't do the wrong thing. Y'know, what unit tests
       | are for.
       | 
       | I think the key here is that the perceived cost / benefit ratio
       | is too high. It's the perception that drives their behavior
       | though. I'm in a company now that has zero unit tests, because
       | they just don't see the value in it (and in their case they may
       | be right for a whole slew of reasons).
       | 
       | Also, remember that games are not very long-lived pieces of
       | software. You build it, release it, maybe patch it, and move on.
       | If the game moves to version 2 then you're probably going to re-
       | write most of the game from scratch. When you support software
       | for a decade then the code is what's valuable, and unit tests
       | keep institutional knowledge about the code. But with disposable
       | software like games, the mechanics of the game and IP are what's
       | valuable.
       | 
       | Why would you write a unit test for something you know you're
       | going to throw away in 6 months?
        
         | Atotalnoob wrote:
         | I am curious as to why your current company does not have unit
         | tests. Do you mind sharing?
        
           | shaftway wrote:
           | We produce a library that gets included in software made by
           | our clients, and we have several thousand clients. The uptake
           | on new releases is low (most of the clients believe in "if it
           | ain't broke, don't fix it"). So every release has the
           | potential to live in the wild and need support for a long
           | time.
           | 
           | We're also in an industry with a ton of competitors.
           | 
           | On top of that, the company was founded by some very junior
           | engineers. for most of them this was their first or second
           | job out of college. Literally every anti-pattern is in our
           | codebase, and a lot of them are considered best practices by
           | them. Unit tests were perceived as a cost with little
           | benefit, so none were written. New engineers were almost
           | always new grads to save on money.
           | 
           | These facts combined make for an interesting environment.
           | 
           | For starters, leadership is afraid to ship new code, or even
           | refactor existing code. Partially because nobody knows how it
           | works, partially because they don't have unit tests to verify
           | that things are going well. All new code has to be gated by
           | feature flags (there's an experiment right now to switch from
           | try-finally to try-with-resources). If there isn't a business
           | reason to add code, it gets rejected (I had a rejected PR
           | that removed a "synchronized" block from around "return
           | boolValue;"). And it's hard to say they're wrong. If we push
           | out a bad release, there's a very real chance that our
           | customers will pack up and migrate to one of our competitors.
           | Why risk it?
           | 
           | And the team's experience level plays a role too. With so
           | many junior engineers and so much coding skill in-breeding,
           | "best practices" have become pretty painful. Code is written
           | without an eye towards future maintainability, and the
           | classes are a gas factory mixed with a god object. It's not
           | uncommon to trace a series of calls through a dozen classes,
           | looping back to classes that you've already looked at. And
           | trying to isolate chunks of the code is difficult. I recently
           | tried to isolate 6 classes and I ended up with an interface
           | that used 67 methods from the god object, ranging from
           | logging, to thread management, to http calls, to state
           | manipulation.
           | 
           | And because nobody else on the team has significant
           | experience elsewhere, nobody else really sees the value of
           | unit tests. They've all been brought up in this environment
           | where unit test are not mentioned, and so it has ingrained
           | this idea that they're useless.
           | 
           | So the question is how do you fix this and move forward?
           | 
           | Ideally we'd start by refactoring a couple of these classes
           | so that they could be isolated and tested. While management
           | doesn't see significant value in unit tests, they're not
           | strictly against them, but they are against refactoring code.
           | So we can't really add unit tests on the risky code. The only
           | places that you can really add them without pushback would be
           | in the simplest utility classes, which would benefit from
           | them the least, and in doing so prove to management that unit
           | tests aren't really valuable. And I mean the SIMPLEST utility
           | classes. Most of our utility classes require the god object
           | so that we can log and get feature flags.
           | 
           | I say we take off and nuke the entire site from orbit (start
           | over from scratch with stronger principles). It's the only
           | way to be sure. But there's no way I'm convincing management
           | to let the entire dev team have the year they'd need to do
           | that with feature parity, and leadership would only see it as
           | a massive number of bugs to fix.
           | 
           | In the meantime developer velocity is slowing, but management
           | seems to see that as a good thing. Slower development
           | translates into more stable code in their minds. And the
           | company makes enough that it pays well and can't figure out
           | what to do with the excess money. So nobody really sees a
           | problem. Our recruiters actually make this a selling point,
           | making fun of other companies that say their code is "well
           | organized".
        
             | Atotalnoob wrote:
             | Thank you for the write up.
             | 
             | That seems like a bad scenario with bad technical
             | management. I am wondering if you have considered not
             | trying to implement unit tests and think about end to end
             | tests. This might be easier for antitesting people to buy
             | into because it's directly ensuring your end users get the
             | desired outcomes.
             | 
             | It doesn't matter what bad terrible practices you have
             | inside your library if the output is correct...
             | 
             | If you input 1+1, and it outputs 5, it will be obvious how
             | this can be an issue.
             | 
             | What this will enable you to do is get some quick wins and
             | make refactoring safer.
             | 
             | If management still says no, I see 3 major choices.
             | 
             | 1. Quit
             | 
             | 2. Write your tests and keep them to yourself
             | 
             | 3. Mind control
        
               | other_herbert wrote:
               | As a corollary to 2, management tends to love graphs...
               | whatever your using to build should have a plugin that
               | could show unit test success counts and generate even a
               | simple line graph... that alone might be enough incentive
               | to add more testing
        
               | Atotalnoob wrote:
               | I wouldn't use the term "unit test" if they are negative
               | on the concept.
               | 
               | Edit; in fact, don't say test at all. Talk about
               | verification of the output
        
               | shaftway wrote:
               | We do have an integration test that runs just before
               | releases. I've never seen it fail, even when something
               | was obviously broken, so I question the utility of it.
               | There's a specific person in charge of maintaining it.
               | 
               | I've opted for option 4: continue to write code the way
               | they want it written and keep cashing my paychecks. In
               | the meantime there are tons of other improvements that
               | I'm working on, some of which have a more direct impact
               | on business revenue (which has a direct impact on my
               | personal revenue).
        
             | phito wrote:
             | How do you stay sane working with clowns?
        
               | andybak wrote:
               | Many of the things he just described are a rational
               | response to historical circumstance. It's fine to say
               | "we're in a bad place" but that's not the same as saying
               | "we're currently making bad decisions".
        
               | HideousKojima wrote:
               | I get paid good money to work with clowns
        
         | feoren wrote:
         | > Also, remember that games are not very long-lived pieces of
         | software. You build it, release it, maybe patch it, and move
         | on.
         | 
         | This was true a couple decades ago. Nowadays many games are
         | cash cows for decades. Path of Exile was released in 2013,
         | Minecraft in 2011, and World of Warcraft in 2004, and all of
         | those continue to receive regular updates (and have over the
         | course of their lives) and still make plenty of money today.
         | Dwarf Fortress has been in continual development since 2002!
         | (Although probably not your ideal cash-flow model.)
         | 
         | Or you have the EA Sports model where you use the same "engine"
         | and just re-skin some things and re-release the same game over
         | and over. There has been a new "Football Manager" game every
         | year since 2005 -- do you really think they throw out all their
         | code and start over every year?
        
           | shaftway wrote:
           | I maintain that the majority of games are still disposable,
           | despite the occasional subscription model or long-lived hit
           | that pops up. Remember that most games aren't made by AAA
           | studios.
           | 
           | Wasn't Minecraft completely rewritten from scratch in Java
           | after a few years?
           | 
           | And the EA one, like you said, it's just model updates. Very
           | few gameplay mechanics get more than a simple tweak. Just
           | recompile with the new models. You don't need unit tests if
           | the code never changes.
        
             | mark38848 wrote:
             | I think Minecraft was originally written in Java and
             | rewritten in a good programming language (i.e. not Java).
        
               | grumpyprole wrote:
               | Whether or not one thinks C++ is a "good" language, I
               | always thought that (original) Minecraft busted the myth
               | that blockbuster games had to be written in C++.
        
               | CuriousSkeptic wrote:
               | Being written in Java was probably instrumental in
               | enabling the huge modding community around Minecraft.
               | Which in turn was probably in large part responsible for
               | its success.
        
             | cobalt wrote:
             | the original minecraft is in java, it's probably gone
             | through a lot of code transformation. The version you're
             | thinking of is the microsoft version, rewritten in c++
        
           | oefrha wrote:
           | You can add rigor to your decade-plus cash cow later, once
           | it's clear that you've hit the jackpot.
        
           | 2muchcoffeeman wrote:
           | I still play games that came out a couple decades ago...
        
             | tmtvl wrote:
             | Let me guess... Super Metroid? Chrono Trigger? Final
             | Fantasy VI? Ultima Underworld? Symphony of the Night?
             | 
             | There were a few decent games released in the '80s and
             | '90s.
        
         | abdullahkhalids wrote:
         | Valve became serious about software quality in Dota 2 around
         | 2017 - about 7 years after launch. Before that game updates
         | were accompanied with lots of bugs that would take weeks to
         | fix. These days, there are still tons of bugs, but much better
         | than before. They just released one of the biggest updates in
         | the game's history this week, and there are hardly any bugs
         | being reported.
         | 
         | I am pretty sure there is some sort of automated testing
         | happening that is catching these bugs before release.
        
           | zubspace wrote:
           | Reminds me of an article about the testing infrastructure of
           | League and Legends [1] back in 2016. 5500 tests per build in
           | 1 to 2 hours.
           | 
           | Games are extremely hard to test. For me it falls into the
           | same category like GUI testing frameworks which imho are
           | extremely annoying and brittle. Except that games are
           | comparable to a user interface consisting of many buttons
           | which you can short and long press and drag around while at
           | the same time other bots are pressing the same buttons,
           | sharing the same state influenced by a physics engine.
           | 
           | How do you test such a ball of mud which also constantly
           | changes by devs trying to follow the fun? Yes you can
           | unittest individual, reusable parts. But integration tests,
           | which require large, time sensitive modules, all strapped
           | together and running at the same time? It's mindboggling
           | hard.
           | 
           | Moreover if you're in a conceptual phase of development and
           | prototyping and idea, tests make no sense. The requirements
           | change all the time and complex tests hold you back. But the
           | funny thing is, that game development stays in that phase
           | most of the time. And when the game is done, you start a new
           | one with a completely different set of requirements.
           | 
           | There are exceptions, like League of Legends. The game left
           | the conceptual phase many years ago and its rules are set in
           | stone. And a game which runs successfully for that long is
           | super rare.
           | 
           | [1] https://technology.riotgames.com/news/automated-testing-
           | leag...
        
             | abdullahkhalids wrote:
             | I doubt Dota 2 devs are writing code like this to test. The
             | game is far too complicated, even more so than league, and
             | changes a lot over the years, for this to be viable.
             | 
             | Dota 2 and openai had a collaboration in 2018ish, and
             | during this time the Dota 2 bots system was reworked
             | completely. They already can generate videos of every spell
             | in action [1], and I would assume this is done by asking AI
             | bots to demonstrate the spell. My guess is that before
             | pushing out an update, a human looks at these videos and
             | other more complex interaction videos for every major
             | change, along with relevant numbers (damage, healing,
             | movement speed), and see if everything makes sense.
             | 
             | I think this, because a lot of times recently, changes in
             | one hero often cause an un-updated hero to break, because
             | they had some backend similarity. And the patch is released
             | with the bug.
             | 
             | Then again, there is no public info, so all the above are
             | wild speculations.
             | 
             | [1] example https://www.dota2.com/hero/treantprotector
        
               | duskwuff wrote:
               | > They already can generate videos of every spell in
               | action [1]
               | 
               | I'm fairly certain those videos are all handmade. (Yes,
               | all 500+ of them.) Notice that the videos for each hero
               | are recorded in different locations on the map, and the
               | "victim" hero isn't always the same.
        
             | rhdunn wrote:
             | I recall some Minecraft tests being saved worlds with
             | redstone logic that will light a beacon green if it is
             | working or red if not. That's usefull for games like that.
             | 
             | For games like Starcraft 2 with replay functionality, you
             | could probably record/use several matches and test that the
             | behaviour matches the recorded behaviour. If you can make
             | your game have a replay feature you can make use of this,
             | even if you don't ship that replay code.
             | 
             | For things like CYOA type games or decision trees, you
             | could have a logging mechanism that prints out the choices,
             | player stats, hidden stats, etc. and then have a way to run
             | through the decisions, then check the actual log output
             | against the expected output. -- I've done something similar
             | when writing parsers by printing out the parse tree (for
             | AST parser APIs) or the parse events (for reader/SAX parser
             | APIs).
             | 
             | I'm sure there are other techniques for testing other parts
             | of the system. For example, you could test the rendering by
             | saving the render to an image and comparing it against an
             | expected image. IIRC, Firefox does something similar for
             | some systems like the SVG renderer and the HTML paint code.
             | 
             | Various of these features (replay, screenshots) are useful
             | to have in the main game.
        
               | zubspace wrote:
               | You're right about parts, which are mostly state
               | machines. The have a defined input and output. Tests are
               | straightforward to implement and adjust.
               | 
               | But recording and replaying matches? Taking screenshots
               | and comparing the output? Just think about it: If you
               | have recorded a match and change the hitpoints of a
               | single creature, the test could possibly fail. And then?
               | Re-record the match?
               | 
               | The same applies to screenshots: What happens if models,
               | sprites or colors change?
               | 
               | In my experience, tests like this are annoying, because:
               | 
               | 1) They take a long time to create and adjust/recreate.
               | 
               | 2) They fail for minor reasons.
               | 
               | 3) It takes time to understand, what such tests even
               | measure, if someone else made them.
               | 
               | 4) You need a large, self made framework to support such
               | tests.
               | 
               | 5) It takes a long time to run them, because they are
               | time dependent.
               | 
               | 6) They hinder you to make large changes.
               | 
               | 7) It's cheaper to make some low wage game testers play
               | your game. Or better, make the game early access and let
               | 1000s of players test your game for free, while even
               | making money out of them
        
               | vlovich123 wrote:
               | Yes, when you are trying to intentionally change the
               | output, you simply regenerate the gold file to be used as
               | reference (and yes, it should be easy). It's brittle for
               | sure but it does catch unintentional changes and should
               | be used where relevant (if sparingly). There are
               | definitely existing frameworks that do this (eg Jest
               | calls this snapshot testing and has tooling to make it
               | easy).
               | 
               | I'm sorry your experiences with this kind of stuff have
               | been bad. I've generally had good experiences in the
               | machine learning space where we used it judiciously where
               | appropriate but didn't overdo it.
               | 
               | I don't see how it can ever hinder you though - you can
               | always choose to go "I don't care that the output has
               | changed dramaticallly - it's the new ground truth" as
               | long as you communicate that's what happening in your
               | commit. What it doesn't let you do is that the output is
               | different every time you run it but that's generally a
               | positive (randomness should be intentionally injected
               | deterministically).
        
         | treflop wrote:
         | I've seen people slog through untested code where they fear to
         | make a change but I've also seen people slog through code with
         | too much test coverage where the tests go through constant
         | churn.
         | 
         | I don't understand why people don't just add one test even if
         | the codebase otherwise has zero tests if they're so scared of
         | one area and I don't get why people keep adding excessive
         | coverage if it's wasting their time.
         | 
         | It's like people pick a stance and then stick with it forever
         | when I couldn't care less how I've been doing something for 10
         | years if today you showed me a better way.
        
           | smrq wrote:
           | This is the way. My work codebase has probably 5% unit test
           | coverage -- it's frontend and a lot of it isn't sensible to
           | unit test -- but I'm quite happy to have the tests we do. If
           | it's nontrivial logic, just test it. If it isn't (it's
           | trivial, it's aesthetic, whatever your reason)... just don't.
        
           | wesselbindt wrote:
           | >too much test coverage where the tests go through constant
           | churn
           | 
           | This doesn't sound so much as too much coverage but rather
           | like having your automated tests be coupled to implementation
           | details. This has a multitude of possible causes, for example
           | too the tests being too granular (prefer testing at the
           | boundary of your system). I've worked in codebases where
           | test-implementation detail coupling was taken seriously, and
           | in those I've rarely had to write a commit message like "fix
           | tests", and all that without losing coverage.
        
             | kbolino wrote:
             | It feels like there are two levels of test writing
             | proficiency. The first is writing the tests that have high
             | benefit and low cost: e.g. pure functions with
             | comprehensive tabular tests, simple method chains that have
             | well defined sequential behavior and few dependencies, high
             | value regression tests against detailed bug reports, etc.
             | IMO it's harder to argue against writing these tests than
             | to argue for writing them.
             | 
             | Then there's the second level of proficiency, related to
             | what you're discussing with "test-implementation detail
             | coupling". This is the domain of high test coverage,
             | repeatable end-to-end tests, automated QA, etc. I've always
             | struggled with this next level and I've yet to work in any
             | environment where it was done effectively (if at all). It's
             | also harder to argue for this kind of testing because the
             | tests often end up brittle and false negatives drown out
             | the benefits.
             | 
             | Moreover, most of the discourse centers around the first
             | level of proficiency only and it's much harder to find
             | digestible advice for achieving the second.
        
             | bluefirebrand wrote:
             | > This doesn't sound so much as too much coverage but
             | rather like having your automated tests be coupled to
             | implementation details
             | 
             | Depending on how high coverage you are aiming for, I find
             | it hard to imagine a way to achieve it without inevitably
             | tying the tests to implementation details
        
             | armchairhacker wrote:
             | Even if the tests aren't coupled to implementation details,
             | in most projects the specification itself goes through many
             | changes. Furthermore, as the implementation is being
             | changed, it stops depending on some lower-level helper code
             | and requires new code with a different purpose; the tests
             | in the old code turn out to be largely (albeit not
             | entirely) a waste of effort.
             | 
             | Changing specifications and code which turns out to be
             | unnecessary aren't ideal. but I believe they're inevitable
             | to some extent (unless the project is a narrow re-
             | implementation of something that already exists). There are
             | questions like "how will people use this product?" and
             | "what will they like/dislike about it?" that are crucial to
             | the specification yet can't be answered or even predicted
             | very well until there's already a MVP. And you can't know
             | exactly what helper classes and functions you will use to
             | implement something until you have the working
             | implementation.
             | 
             | Of course, that doesn't mean all tests are wasted effort;
             | development will be slower if the developers have to spend
             | more time debugging, due to not knowing where bugs
             | originate from, due to not having tests. There's a middle
             | ground, where you have tests to catch probable and/or
             | tricky bugs, and tests for code unlikely to be made
             | redundant, but don't spend too long on unnecessary tests
             | for unnecessary code.
        
         | jrockway wrote:
         | Testing is a continuum. I don't write a test for every change.
         | Sometimes I spend a week writing tests for a simple change.
         | 
         | I will say that I've never said "I wish I didn't write a test
         | for that". I have also never said, "your PR is fine, but please
         | delete that test, it's useless".
         | 
         | I throw away a lot of code. I still test stuff I expect to
         | throw away. That's because it probably needs to run once before
         | I throw it away, and I can't start throwing it away until it
         | works :/
         | 
         | What it comes down to is what else you have to spend your time
         | on. Sometimes you need to experiment with a feature; get it out
         | to customers, and if it's buggy and rough around the edges,
         | it's OK, because you were just trying out the idea. But
         | sometimes that's not what you want; whatever time you spend on
         | support back and forth finding a bug would have been better
         | spent not doing that. The customer needed something rock solid,
         | not an experiment. Test that so they don't have to.
         | 
         | There are no rules. "Write a test for every change" is just as
         | invalid and unworkable as "Never write any tests". It's a
         | spectrum, and each change is going to land somewhere different.
         | If you're unsure, ask a coworker. I have been testing stuff for
         | 20+ years, and I usually guess OK (that is when I take a
         | shortcut and don't test as much as I should, it's rarely the
         | thing that caused the production outage), but a guess is just
         | that, a guess. Solicit opinions.
        
         | withinboredom wrote:
         | Also, non-testable code is often faster (as in cpu time).
        
       | whatasaas wrote:
       | Seems like an excuse that might be fine for small indie teams for
       | a while. The blog certainly blurs the lines between unit and
       | functional tests. In the end, even modest code coverage can pay
       | off. Tests help with code review, understanding the codebase, and
       | can provide an easy map for debugging. But if you're in an
       | environment where everyone is constantly demanding changes and
       | only testing the happy path, then good luck.
        
       | fwlr wrote:
       | Arguments against testing tend to fall prey to the von Neumann
       | Objection: they insist there is something tests can't catch, and
       | then they tell you precisely what it is that tests can't catch...
       | so you can always imagine writing tests for that specific thing.
       | 
       | E.g. this article uses an example of removing the number 5,
       | causing the developer to have to implement a base-9 numbering
       | system. Unit tests that confirm this custom base number system is
       | working as expected would be extremely reassuring to have.
       | Alternatively, you could keep the base-10 system everyone is
       | familiar with, and just have logic to eliminate or transform any
       | 5s. This would normally be far too risky, but high coverage
       | testing could provide strong enough assurance to trust that your
       | "patched base-10" isn't letting any 5s through.
       | 
       | The same is true for the other examples - unit testing feels like
       | the first thing I'd reach for when told about flaming numbers.
        
         | magoghm wrote:
         | Tests can't catch race conditions in multithreaded code. Now
         | that I told you what the tests can't catch, can you imagine
         | writing tests for that specific thing?
        
           | a_t48 wrote:
           | I've written tests around multithreaded code, but they
           | typically catch them in a statistical manner - either running
           | a bit of code many times over to try and catch an edge
           | condition, or by overloading the system to persuade rarer
           | orderings to occur.
           | 
           | There's also
           | https://clang.llvm.org/docs/ThreadSafetyAnalysis.html which
           | can statically catch some threading issues, though I've not
           | used it much myself.
        
           | jonex wrote:
           | tsan will catch a bunch of potential race conditions for you,
           | under the condition that you run it somehow. How to make sure
           | it's run? Well, add a test for the relevant code and add it
           | to your tsan run in your CI and you'll certainly catch a
           | bunch of race conditions over time.
           | 
           | This has saved me a bunch of times when I've be doing work in
           | code with proneness to those kind of issues. Sometimes it
           | will just lead to a flaky test, but the investigation of the
           | flake will usually find the root cause in the end.
        
           | mistercow wrote:
           | I've written tests to do exactly that, by adding carefully
           | placed locks that allow the test to control the pace at which
           | each thread advances. It's not _fun_ but you can do it.
        
             | magoghm wrote:
             | Doesn't inserting locks affect the memory hierarchy
             | consistency mechanisms and therefore interfere with
             | possible race conditions?
        
               | mistercow wrote:
               | That's not a situation I've encountered but "race
               | condition" is an extremely broad category.
        
           | distortionfield wrote:
           | > Tests can't catch race conditions in multithreaded code.
           | 
           | Citation needed.
           | 
           | > can you imagine
           | 
           | Yes I can, because several languages have tooling built
           | specifically for finding those race conditions.
           | 
           | If you built it, you can test it. If you can't test it, you
           | don't understand what you built.
        
         | jayd16 wrote:
         | The lesson is more about the degree of churn and how game rules
         | are not hard rules. A valid base 9 number system is NOT a
         | design goal and doing that work can be a waste.
         | 
         | It's like testing that the website landing page is blue. Sure
         | you can but breaking that rule is certainly valid and you'll
         | end up ripping out a lot of tests that way.
         | 
         | Now, instead of calcifying the designer's whims, testing should
         | be focused around things that actually need to make sense, ie
         | abstract systems, data structures etc etc.
        
           | fwlr wrote:
           | Tests that "calcify the designer's whims" - great way to put
           | it - can be quite useful if your job description happens to
           | be "carrying out the whims of the designer" (and for many of
           | us, it is!)
           | 
           | With high coverage and dry-ish tests, changing the tests
           | _first_ and seeing which files start failing can function as
           | a substitute for find+replace - by altering the tests to
           | reflect the whims, it'll tell you all the places you need to
           | change your code to express said whims.
        
         | HideousKojima wrote:
         | Nah, my objection to unit testing is that too often it devolves
         | into what I call "Testing that the code does what the code
         | does." If you find yourself often writing code that also
         | requires updating or rewriting unit tests, your tests are
         | mostly worthless. Unit tests are best for when you have a
         | predefined spec, or you have encountered a specific bug
         | previously and make a test to ensure it doesn't reoccur, or you
         | want to make sure certain weird edge cases are handled
         | correctly. But the obsession with things like 100% unit test
         | coverage is a counterproductive waste of time.
        
           | fwlr wrote:
           | I partially agree - I would say more specifically "those
           | situations are the easiest to write good tests for", ie
           | having a predefined spec will strongly guide you towards
           | writing good and useful tests.
           | 
           | "Testing that the code does what it does" is of course a
           | terrible waste of both the time spent writing those tests,
           | and of future time spent writing code under those tests. With
           | skill and practice at writing tests, you make that mistake
           | less often. Perhaps there's a bit of a self-fulfilling
           | prophecy for game developers: due to industry convention,
           | they're unfamiliar with writing tests, they try writing
           | tests, they end up with a superfluous-yet-restrictive test
           | suite, thus proving the wisdom of the industry convention
           | against testing.
        
       | kelseydh wrote:
       | Most video game bugs are subtle and not things that are easy to
       | catch with unit testing because they are dynamic systems with
       | many interacting parts. The interaction is where the bugs come
       | from.
       | 
       | QA processes do a good job catching the rest.
        
         | lionkor wrote:
         | I would bet that a lot of those bugs come from utility code
         | that is testable
        
           | Aerroon wrote:
           | Perhaps in development, but the stuff that tends to make it
           | into the release of games seems to be gameplay related. Npc
           | behaviors not lining up, the developer literally not
           | implementing certain stats in the game (looking at you,
           | Diablo 4), graphical bugs caused by something not loading or
           | loading too slowly, performance issues from something loading
           | 1000 copies of itself etc.
        
       | riffraff wrote:
       | I am not convinced of the argument that games change a lot.
       | 
       | I do buy the argument that the trade off between effort and value
       | is different, but that's because it's harder to unit test user
       | interactions than it is to unit test a physics engine.
       | 
       | It's more or less the reason in the early life of the web few did
       | end to end testing involving browsers, or unit tested iOS apps in
       | the first releases of the iPhone.
        
       | mistercow wrote:
       | I've found that the one thing you can always count on engineers
       | to do is to dismiss sensible tools from adjacent domains using
       | flimsy, post hoc justifications.
       | 
       |  _All_ product development involves poorly defined boundaries
       | where the product meets the user, where requirements shift
       | frequently, and where the burdens of test maintenance have to be
       | weighed against the benefits.
       | 
       | You don't throw out all of unit testing because it doesn't work
       | well for a subset of your code. You throw out all of unit testing
       | because writing tests is annoying, none of your coworkers have
       | set it up, and the rest of your industry doesn't do it, so you
       | feel justified in not doing it either.
        
         | stouset wrote:
         | Right. And _because_ the rest of the industry isn't doing it,
         | there's no institutional knowledge of how to do it well. So
         | someone tries it, they do a crap job of it out of
         | understandable ignorance, and rather than taking forward any
         | lessons learned the effort is discarded as a waste of time.
        
         | jayd16 wrote:
         | So wait, who's being dismissive of who's practices here?
        
           | mistercow wrote:
           | I didn't say engineers are dismissive of other engineers'
           | practices. The general pattern is "that makes sense for your
           | field, but we can't use it because..." followed by silly
           | reasons.
           | 
           | I was guilty of this myself back when I was an indie dev. It
           | took me an embarrassingly long time, for example, to admit
           | that git wasn't just something teams needed to coordinate,
           | and that I should be using it as the sole developer of a
           | project.
        
             | eviks wrote:
             | Interesting, was the impetus for change some big issue
             | you've run into? Or just a gradual accumulation of
             | knowledge about other people experiences made you
             | reconsider? Or something else?
        
               | mistercow wrote:
               | It was a long time ago, but it was probably having to
               | switch from major feature work to emergency bug fixes
               | that finally became painful enough for me to acknowledge
               | that manual backups weren't going to cut it.
        
       | jrockway wrote:
       | Bugs are kind of the fun part of games. If every subroutine
       | worked perfectly, you wouldn't have the chaos of real life. Some
       | of players favorite mechanics are just bugs. (Overwatch example:
       | Mercy's super jump, now a legitimate predictable mechanic that
       | everyone can do, not just people that read the forums and watch
       | YouTube videos about bugs. It started out as a bug, and it was so
       | cool and skill-ceiling increasing that now it's just part of the
       | game.)
       | 
       | Having said that, sometimes you need unit tests. Overwatch had
       | this bug where there is an ultimate ability called "amplification
       | matrix" that is a window that you shoot through and the bullets
       | do twice as much damage. One patch, that stopped working. This
       | kind of issue is pretty easy to miss in play testing; if you're
       | hitting headshots, then the bullets are doing the 2x damage they
       | would if they were body shots that got properly amplified. If is
       | very hard to tell damage numbers while play testing (as evidenced
       | by how many patches are "we made character X do 1 more damage per
       | bullet", and it smooths things out over the scale of millions of
       | matches, but isn't really that noticeable to players unless
       | breakpoints change). So for this reason, write an integration
       | test where you set up this window thingie, put an enemy behind
       | it, fire a bullet at a known point, and require.Equals(damage,
       | 200). Ya just do it, so you don't ship the bug, make real people
       | lose real MMR, and then have to "git stash" that cool thing
       | you're working on today, check out the release branch, and
       | uncomment the code that makes the amp matrix actually work. Games
       | are just software engineering. Fun software engineering. But it's
       | the same shit that your business logic brothers and sisters are
       | working on.
       | 
       | (Overwatch also had a really neat bug, that the community
       | believes was due to a x == 0 check instead of an x < 0 check. If
       | you pressed the right buttons while using Bastion's ultimate, you
       | had infinite ammo. Normally it fires 3 air strikes, but if you
       | got that counter to decrement twice and skip the == 0 check, then
       | you got to do it an infinite number of times. (Well, actually
       | 2^32 or 2^64 times. Eventually you'd integer overflow and have
       | the chance to hit 0 again. Anyway, this was absolutely hilarious
       | whenever it happened in game. The entire map would turn into a
       | bunch of targets for artillery shells, the noise to alert you of
       | incoming missiles would play 100 times more than normal, and it
       | was total chaos as everyone on your team died. And not even that
       | gamebreaking; both teams have the option to run the same
       | characters, so you could just do it back to your opponent. Very
       | fun, but they fixed the bug quickly.
       | 
       | Follow up follow up: all of these silly bugs are in ultimates,
       | which come up the least often of all abilities in the games.
       | That's what happens with playtesting. You don't get test coverage
       | where you need it. A test you write covers the stuff you're most
       | scared about. A careful engineer that likes testing would have
       | never shipped these.)
        
         | AndyPa32 wrote:
         | > Games are just software engineering. Fun software
         | engineering.
         | 
         | I do question the "fun" part. Midnight crunches, unpaid
         | overtime and - as far as I have read - some of the worst
         | working conditions in all of software engineering. I pass.
        
           | jrockway wrote:
           | That is probably true, but you know you suffer for your art
           | and all that. People don't really like software, but they
           | love games. We know that games are just software, but it's so
           | fun, that people forget that. It's pretty cool. Though to me,
           | I kind of like getting 8 hours of sleep a night and playing
           | other people's games. While getting paid more :/
        
       | rkachowski wrote:
       | It's an interesting idea, but here you have the game designer
       | taking the place of the product manager stereotype - coming up
       | with bizarre unfeasible ideas and the programmer is to make it
       | happen.
       | 
       | In any games company I've worked for the designer is responsible
       | for mapping and balancing the rules and mechanics of the game,
       | they would provide a specification of what "red vs blue numbers"
       | would look like and a balanced idea of how to remove the number 5
       | from the game (balancing and changing the rules like this being
       | entirely within the domain of game design). incidentally any game
       | company I've worked at has had an extensive set of test suites.
        
       | epgui wrote:
       | What nonsense that is... The idea that intentional mathematical
       | design / correct, well-specified behaviour doesn't apply to games
       | is absurd.
        
       | SillyUsername wrote:
       | Ok I know which AAA game studio this might be because I
       | interviewed with them and had to sign an NDA.
       | 
       | In their case their flagship game is full of bugs, and they had
       | to ship their product asap pre-aquistion when they were a
       | startup.
       | 
       | Because of the mentality of the managers, and weak minded devs,
       | they don't write unit tests, and instead spend the vast majority
       | of their days fighting bugs, so much so they have to hire
       | dedicated staff for their (single game) backlog as they were
       | struggling to keep up "with its success".
       | 
       | This is BS of course, I saw their backlog and it was a shit show,
       | with Devs expected to work overtime free of charge to get actual
       | features out (funny how this works isn't it, never affects the
       | business execs' time/life who make the demands of no tests).
       | 
       | I was asked what I would bring to the company to help them
       | support their now AAA game, and I stated up front "more unit
       | tests" and indirectly criticised their lack of them. I got a call
       | later that day that (the manager thought) "I would not be a good
       | fit".
       | 
       | I got a lead job elsewhere that has the company's highest
       | performing team, literally because of testing practices being
       | well balanced between time and effectiveness (i.e. don't bother
       | with low value tests, add tests if you find a bug etc, if an
       | integration test takes too long leave it and use unit tests).
       | 
       | I think back to that interview every time I interview at games
       | studios now, and wonder if I shouldn't push unit tests if they're
       | missing. I'd still do it. The managers at that job were assholes
       | to their developers, and I now recognise the trait in a company.
        
       | Joel_Mckay wrote:
       | 1. Most game engines have the horrible compatibility layers
       | abstracted away, and already fully tested under previous mass
       | deployments
       | 
       | 2. Anything primarily visual, audio, and control input based is
       | extremely hard to reliably automate testing. Thus, if the
       | clipping glitches are improbable and hardly noticeable... no one
       | cares.
       | 
       | Some people did get around the finer game-play issues by simply
       | allowing their AI character to cheat. Mortal Kombat II was famous
       | for the impossible moves and combos the AI would inflict on
       | players... yet the release was still super popular, as people
       | just assumed they needed more practice with the game.
       | 
       | Have fun out there, =)
        
       | throwaway115 wrote:
       | Having written a 3d game engine from scratch, I had automated
       | tests, but they were more comparable to "golden" tests, which are
       | popular in the UI test world. Basically, my renderer needed to
       | produce a pixel-perfect frame. If a pixel didn't match, an image
       | diff was produced. This saved my butt numerous times when I broke
       | subtle parts of the renderer.
        
       | vmaurin wrote:
       | > why game devs don't unit test
       | 
       | Sources ?
        
       | PlunderBunny wrote:
       | I wrote a game using a 'bottom up' design (i.e. IO and business
       | logic first), and I wrote unit tests for the business logic as I
       | went. With no UI, I effectively tested and stepped through my
       | code with unit tests. I had the luxury of working by myself at my
       | own pace.
       | 
       | I have a reasonably clean separation between the UI and the rest
       | of the code, but I don't have any unit tests for the UI (I think
       | - correct me if I'm wrong here - that would require integration
       | tests rather than unit tests?) What I'm trying to say is that, if
       | you don't do it this way around, and/or you have multiple
       | programmers writing the game at once, and/or you _really_
       | optimise for performance, I can imagine that would make it much
       | harder to write unit tests.
        
       | smokel wrote:
       | "It's _never_ good to be dogmatic. "
       | 
       | In some situations unit tests can be very effective and useful,
       | such as in testing complex algorithms, or in code bases where
       | some serious refactoring is required, and where one don't want to
       | break existing behavior. In backend development, where user
       | facing output is limited, there is typically no other practical
       | way to check that things are working properly.
       | 
       | However, in games, and typical front-end development, especially
       | in its early stages, it can be beneficial to be as flexible as
       | possible. And however way you put it, unit tests simply make your
       | code more rigid.
       | 
       | In the latter situation, some people prefer guard rails and find
       | that they are more flexible with unit tests in place. Others
       | prefer not to care about unit tests and attain higher
       | productivity without them.
       | 
       | Only when an application grows to a certain size where a
       | developer does not naturally inspect typical behavior all day,
       | and if quality is important, it starts to make sense to put in
       | automated testing, because it is simply more cost effective.
       | 
       | Similar reasoning goes for dynamic vs static typing.
       | 
       | It seems that some people think that everyone should _always_ use
       | the same approach for any kind of software development, because
       | it worked for them at some point in time. Over time I have grown
       | a preference to avoid working with such people.
        
       | follower wrote:
       | For an alternative perspective on testing & game development,
       | here's a video I've seen from few years ago:
       | 
       | * "Automated Testing of Gameplay Features in 'Sea of Thieves'":
       | https://www.youtube.com/embed/X673tOi8pU8?si=uj_lcMEC9nvMpa6...
       | 
       | ~via https://www.gdcvault.com/play/1026366/Automated-Testing-
       | of-G... :
       | 
       | "Automated testing of gameplay features has traditionally not
       | been embraced by the industry, due to the perceived time required
       | and difficulty in creating reliable tests. Sea of Thieves however
       | was created with automated testing for gameplay from the start.
       | This session explains why automated testing was the right choice
       | for Sea of Thieves and how it could benefit your game. It shows
       | the framework that was built by Rare to let team members create
       | automated tests quickly and easily, the different test types we
       | built, and what level of test coverage was found to be
       | appropriate. The session also contains best practices for making
       | tests work reliably and efficiently, using clear worked through
       | examples."
       | 
       | Looks like there's also related talks in later years (which may
       | or may not be currently available as free-to-view--I've not
       | watched these ones):
       | 
       | * "Lessons Learned in Adapting the 'Sea of Thieves' Automated
       | Testing Methodology to 'Minecraft'":
       | https://www.gdcvault.com/play/1027345/Lessons-Learned-in-Ada...
       | 
       | * "Automated Testing of Shader Code" (GDC 2024):
       | https://schedule.gdconf.com/session/automated-testing-of-sha...
        
       | frou_dh wrote:
       | This makes me think of the claim you sometimes see that memory-
       | safety is not that relevant for game development because in many
       | cases games aren't security-sensitive software. But even putting
       | security vulnerabilities aside completely, plain old memory
       | corruption can be a major drag when it rears its head (and can
       | even kill projects if the game can't be wrangled into being
       | crash-free by the deadline). This particularly applies to games
       | with huge codebases and numbers of programmers.
        
       | larsrc wrote:
       | Disclaimer: never was a game dev
       | 
       | You're conflating unit tests and functional/integration tests
       | there. A unit test should test that a single
       | function/method/class does what it's expected to do. The game
       | design changes should change how they are put together, but not
       | often what they do. If your setThingOnFire() method suddenly also
       | flips things upside down, you're going to have a bad day. Instead
       | your callers should add calls to flipThingUpsideDown().
        
       | bentt wrote:
       | I don't see why design needs to have so much impact on whether
       | there are Unit Tests. Most unit tests should be much lower level
       | than anything which would be balanced or user tested out of
       | relevance. You want stuff like "can the player character jump and
       | clear these sets of obstacles which we are building all of our
       | levels with?", and then a nice script that takes prerecorded
       | input and sees if the outcome is determinant. So, this way, if
       | someone inadvertantly changes gravity, or friction, or whatever
       | in the basic systems that determine locomotion, you'll catch it
       | early.
       | 
       | Now, do most game makers take the time to do this? No, because
       | they will likely have a lot else to do and make an excuse not to
       | do it. However, for the most vital tech foundations, it is a good
       | idea.
       | 
       | What gamedev does tend to do more often is smoke testing. Just
       | load up each level and see if it runs out of memory or not.
       | Automated testing on build to see if something broke. It's less
       | granular than unit testing, but when you're building over and
       | over in the heat of a closeout on a project, this type of thing
       | can tease out a breaking bug early as well.
       | 
       | Overall, I like the title of the OP article, but not much that's
       | said within.
        
       ___________________________________________________________________
       (page generated 2024-05-25 23:02 UTC)