[HN Gopher] Unit testing a TCP stack (2015)
       ___________________________________________________________________
        
       Unit testing a TCP stack (2015)
        
       Author : PuercoPop
       Score  : 120 points
       Date   : 2021-08-26 06:15 UTC (2 days ago)
        
 (HTM) web link (www.snellman.net)
 (TXT) w3m dump (www.snellman.net)
        
       | gravypod wrote:
       | There's a lot of push back from engineers - especially people at
       | lower levels of the stack - against testing infrastructure. One
       | particularly famous example is Linux. Rather than testing before
       | merging in code, they merge in code and then test the release
       | candidate as a whole. It also seems game developers are extremely
       | against automated testing frameworks as a whole. I've heard many
       | times that it would be impossible to develop an enemy AI in a
       | test-driven way (I did this for a senior project in college -
       | finished the AI before the game was able to even start testing it
       | [0]).
       | 
       | I wonder what would need to happen to convince people that:
       | 
       | 1. Even if you do something extremely low level, you can draw a
       | distinction between your hardware and the interface that 99% of
       | your software runs at.
       | 
       | 2. You can develop complex behaviors iteratively with automated
       | testing just like you can develop complex programs iteratively
       | (tests are just programs).
       | 
       | [0] - https://github.com/gravypod/it491-disabler-ai
        
         | axguscbklp wrote:
         | Or maybe those people know what they are talking about and the
         | truth is that many people who are big fans of automated testing
         | tend to overrate its value when it comes to many areas of
         | software development. Testing takes effort and makes it harder
         | to change things. In an ideal world where testing came for free
         | then sure, more testing would be better. In the real world
         | there are tradeoffs. If I am writing code that controls a
         | spaceship then it makes sense to spend a huge amount of effort
         | on testing. On the other hand, if I am adding a feature to a
         | web application then in my personal experience, most of the
         | time adding automated testing is a waste of effort.
        
           | gravypod wrote:
           | > Or maybe those people know what they are talking about and
           | the truth is that many people who are big fans of automated
           | testing tend to overrate its value when it comes to many
           | areas of software development.
           | 
           | There's been a lot of research, and internal studies, done at
           | many companies that show pretty impressive benefits.
           | 
           | When really questioned most engineers just say "I know my
           | code works" or "I test my code, I don't need automated
           | tests". That's the mentality I just don't understand.
           | 
           | > Testing takes effort and makes it harder to change things.
           | 
           | If it "makes things hard to change" just delete the test?
           | You'll still get the benefit of knowing XYZ are
           | broken/altered. You can also automate end-to-end and black
           | box tests which should absolutely not require any
           | modification if you're just refactoring.
           | 
           | > If I am writing code that controls a spaceship then it
           | makes sense to spend a huge amount of effort on testing. On
           | the other hand, if I am adding a feature to a web application
           | then in my personal experience, most of the time adding
           | automated testing is a waste of effort.
           | 
           | If you are working something that is allowed to fail, then
           | sure, you don't really need to care about what practices you
           | do. It's a very end-all-be-all argument to say "it's ok for
           | my things to break". That argument goes just the same for all
           | of these things:
           | 
           | "Why do I need a version control system? It's fine if I
           | manually merge my code incorrectly"
           | 
           | "Why do I need a build system? It's fine if I forget to
           | recompile one of my files"
           | 
           | etc.
           | 
           | In addition: the "argument" for automated testing isn't that
           | it will just prevent you from breaking something. It's that
           | it lets you know when things change and makes it easy to
           | update your code without manually checking if things are
           | broken. Recently, when adding features to our frontend, I
           | just run our tests and update a png file in our repo. I then
           | play around until my styling is how I like it. It's
           | completely automated and saves me a lot of time. It also lets
           | others know immediately when their CSS change will effect, or
           | will not effect, my components.
        
             | axguscbklp wrote:
             | >There's been a lot of research, and internal studies, done
             | at many companies that show pretty impressive benefits.
             | 
             | I would need to look at the research and studies to see
             | whether I actually believe them. I have seen how
             | politicized and tribal technical decisions can become at a
             | company and can easily imagine that there might be
             | confounding variables.
             | 
             | >When really questioned most engineers just say "I know my
             | code works" or "I test my code, I don't need automated
             | tests". That's the mentality I just don't understand.
             | 
             | If I do not write automated tests and then my code works
             | fine in production 95% of the time, and out of the 5% of
             | the time that it breaks, 99% of the time it is a problem
             | that is easily fixed and causes no major problems, and even
             | the few major problems are of the "lose a manageable amount
             | of money" kind and not the "people get injured or killed"
             | kind - meanwhile if, on the other hand, writing automated
             | tests would add 50% more effort to my work - then the
             | cost/benefit analysis might suggest that I should not write
             | automated tests. Keep in mind that I could spend that 50%
             | more effort instead doing things that will make my code
             | less likely to break but that are not automated testing.
             | "not adding automated testing" is not the same thing as
             | "not doing anything that will make the code less likely to
             | break".
             | 
             | >If it "makes things hard to change" just delete the test?
             | You'll still get the benefit of knowing XYZ are
             | broken/altered.
             | 
             | If I delete the test then I will have wasted part of the
             | effort that went into writing it. Usually when I change
             | code I already know that things will work differently
             | afterward and that things might break, so this would not
             | tell me anything I did not already know.
             | 
             | >You can also automate end-to-end and black box tests which
             | should absolutely not require any modification if you're
             | just refactoring.
             | 
             | Agreed - I am much more friendly towards end-to-end tests
             | than towards unit tests. I still would not advocate them
             | dogmatically, but I find them to be more useful than unit
             | tests.
             | 
             | >If you are working something that is allowed to fail, then
             | sure, you don't really need to care about what practices
             | you do. It's a very end-all-be-all argument to say "it's ok
             | for my things to break". That argument goes just the same
             | for all of these things:
             | 
             | I think that you might be seeing things in too binary a
             | way. The vast majority of software products are allowed to
             | fail, but not all of them are allowed to fail to the same
             | degree. The practices do matter and there is no one-size-
             | fits-all approach to testing. The context matters. What
             | rate of failure is acceptable such that to try to prevent
             | failure beyond that rate would actually be
             | counterproductive? What specific impacts would more vs.
             | less testing have on the development process? How do the
             | relevant pros and cons fit into the overall goal of the
             | organization? Etc.
             | 
             | I am not saying "it's ok for my things to break", nor are
             | probably most people who question automated testing dogma
             | saying "it's ok for my things to break". We are saying that
             | there are tradeoffs. Sometimes adding more automated
             | testing does not actually add value. Again, it depends on
             | the exact context.
             | 
             | Regarding the rest of what you wrote: I am not disputing
             | that automated testing can bring lots of benefits. I am
             | just saying that I think some people are too dogmatic about
             | it, see it too much as a magic pill, push for its use too
             | broadly, and do not take the relevant tradeoffs
             | sufficiently into account.
        
         | lloydatkinson wrote:
         | I agree - it seems it's only excuses that keep certain devs
         | away from testing. Even Factorio manages testing, though its
         | more of an integration test. I'm sure it could be done with
         | unit tests too.
        
           | gravypod wrote:
           | I think even integration tests would be a huge improvement
           | for most teams and software. As long as some automated checks
           | are run before code is merged you're going to save yourself a
           | lot of heart ache.
        
         | PezzaDev wrote:
         | Automated testing is hard to justify for games because you are
         | not simply finishing features and moving on. You are constantly
         | experimenting. Throwing out features and ideas and reworking
         | them is part of the creative process. Automated testing adds
         | overhead to iteration and isn't free, so you have to be very
         | selective. At the end of the day it is more important for the
         | game to be fun than it is stable.
        
           | jsnell wrote:
           | Though there are games where the fun has been proven, and
           | it's important to be able to iterate fast without breaking
           | what's already there. There's a great 2016 blog post from
           | Riot on how they test LoL:
           | 
           | https://technology.riotgames.com/news/automated-testing-
           | leag...
        
         | [deleted]
        
         | foxfluff wrote:
         | > I wonder what would need to happen to convince people that ..
         | 
         | 3. It's worth it.
         | 
         | I work at (relatively) low levels, and I would absolutely love
         | to have extensive tests (plus more, e.g. TLA+ models to prove
         | critical properties of the systems I work on).
         | 
         | The pushback comes from stakeholders. They don't want to invest
         | time and money into automated testing.
         | 
         | And when no automated testing has been done yet, you can guess
         | that the system hasn't been architected to be easily testable.
         | Figuring out how to add useful tests without massive (time-
         | consuming and expensive, potentially error-prone) re-
         | architecting is also something that requires quite a bit of
         | investment.
         | 
         | Of course a part of is just lack of experience. If someone who
         | knows how it's done could lead by example and show the ropes,
         | that'd probably help. Getting the framework off the ground
         | could be the key to sneaking in some tests in the future, even
         | when nobody asks for them.
        
           | rad_gruchalski wrote:
           | I did tests for something like this once. Not low level, more
           | of a set of 30+ microservices but the concept is there. The
           | black box testing. This was for a smart home solution based
           | on RabbitMQ. The client wanted to replace RabbitMQ with Kafka
           | but they were anxious because there was no way to verify that
           | the replacement would behave the same way.
           | 
           | So we have spent 2 months writing black box tests against the
           | RabbitMQ version, swapped it out with Kafka and fixed all
           | issues within a couple of weeks.
           | 
           | Since then, I believe that the integration tests are so much
           | more valuable than unit tests.
        
       | jsnell wrote:
       | One benefit we discovered with this test framework after the blog
       | post was written was that it made it much more convenient to do
       | fuzzing and differential testing of the TCP stack. The core
       | problem with fuzzing TCP is that there's a lot of incrementally
       | built up state, and everything is extremely timing-dependent.
       | 
       | You basically need the fuzzer to have a model of TCP state so
       | that it can effectively explore the state space, which is quite
       | complicated and not something you can do with off-the shelf
       | tools.
       | 
       | But once you have a bunch of unit tests designed to put the TCP
       | stack into a specific state + a way of saving and restoring that
       | state, it's really easy to just have snapshot of interesting
       | situations where you can run a fuzzer on the next packet to be
       | transmitted and see what happens.
        
       | 10000truths wrote:
       | It would be nice to have a bring-your-own-I/O TCP stack library
       | that *doesn't* rely on custom callbacks - something like BearSSL
       | but for TCP, where the stack is just a pure state machine object
       | and the user is responsible for explicitly shunting packets to
       | and from the state machine, retaining control over when and how
       | the I/O is done. Instead of having to define callbacks for
       | retrieving time and consuming packets, why not explicitly pass
       | the timestamp and packet data to a state machine object via a
       | direct function call?
        
         | Slix wrote:
         | Is Cloudflare's Quiche QUIC library
         | https://github.com/cloudflare/quiche similar to what you're
         | looking for? All I/O must be done by the caller.
        
           | 10000truths wrote:
           | Yeah, that's the general idea. Essentially a state machine
           | with a send queue and receive queue, and four operations:
           | 
           | - Input received raw data
           | 
           | - Output received application data
           | 
           | - Input application data to send
           | 
           | - Output raw data to send
           | 
           | Obviously, since TCP connection state is time sensitive, the
           | "raw data" wouldn't just be the IP packet and headers, but
           | also a time stamp telling the state machine when that packet
           | was received/sent. If you want the state machine to keep
           | track of time even when no packets are being received or
           | sent, there could be an additional operation just to input a
           | timestamp without additional packets. In effect, time is just
           | another input that the user is responsible for feeding to the
           | state machine at sufficiently fine intervals.
           | 
           | In practice, you could emulate this pattern with a callback-
           | oriented protocol stack by populating an in-memory
           | send/receive queue in your callback function, but that design
           | can be somewhat inflexible because it forces potentially
           | undesirable constraints, e.g. an extra memory copy that could
           | otherwise be elided.
        
       | cjfd wrote:
       | Yes, a TCP stack certainly is complex enough to warrant serious
       | automated testing and/or TDD.
       | 
       | The idea of putting the TCP stack in user space is interesting.
       | If one actually could map the memory of the whole device into
       | user space one could maybe have fewer system calls and therefore
       | have better performance.
       | 
       | Also, what I find somewhat irritating about using a linux system
       | is how often one needs to run commands as root (sudo) for common
       | administrative tasks like mounting a disk or stuff like that.
       | Having a user space TCP stack could also decrease the need for
       | that as far as setting up the network is concerned. If the linux
       | machine is single user, as most of them are nowadays, it makes
       | more sense that way, I think.
        
         | avinassh wrote:
         | > The idea of putting the TCP stack in user space is
         | interesting.
         | 
         | Indeed! Julia Evans wrote a really nice post explaining the
         | usecases and benefits - https://jvns.ca/blog/2016/06/30/why-do-
         | we-use-the-linux-kern...
        
         | rmetzler wrote:
         | > one needs to run commands as root (sudo) for common
         | administrative tasks like mounting a disk
         | 
         | I would think if you don't do this, an attacker who is able to
         | execute code but is non-root yet could easily elevate
         | permissions by shadowing legitimate pathes and trick root into
         | executing untrusted code.
         | 
         | I'm not a security engineer and just find it interesting, so if
         | my thinking is off, please correct me.
        
         | stefan_ wrote:
         | The whole "map the PCIE device into userspace process memory"
         | thing is called DPDK (https://www.dpdk.org/)
        
           | ay wrote:
           | And you can combine the two:
           | 
           | https://fd.io/docs/vpp/master/whatisvpp/hoststack.html
           | 
           | And there is a sister project using this tech to get
           | noticeable speed-ups:
           | 
           | https://wiki.fd.io/view/VSAP
           | 
           | Disclaimer: I am involved with the VPP project.
        
         | stingraycharles wrote:
         | > Having a user space TCP stack could also decrease the need
         | for [root privileges] as far as setting up the network is
         | concerned.
         | 
         | I think it's important to distinguish between the protocol
         | (TCP) and the hardware device. You would still absolutely need
         | to talk to the device, it's just that moving a lot of the logic
         | to user space means much less context switching for system
         | calls for the application.
         | 
         | I can imagine on Linux you can talk directly to /dev/eth0 if
         | you would want to (in the same way that you can talk to
         | /dev/sda), and then you would be back at square one regarding
         | root privileges.
        
           | monocasa wrote:
           | > I can imagine on Linux you can talk directly to /dev/eth0
           | if you would want to (in the same way that you can talk to
           | /dev/sda), and then you would be back at square one regarding
           | root privileges.
           | 
           | It's a AF_PACKET, SOCK_RAW socket rather than a device file,
           | but yes.
        
         | yakubin wrote:
         | You don't need to be root to mount disks, when you have udisks
         | installed (which would be almost all distros by default). See
         | _udisksctl(1)_ : <https://manpages.debian.org/buster/udisks2/ud
         | isksctl.1.en.ht...>
        
         | boomlinde wrote:
         | There's nothing inherent about Linux which prevents you from
         | running everything as uid 0. If you're fine with every process
         | you run having the same full privileges and shared ownership of
         | everything, you should.
         | 
         | Most machines, at least outside embedded devices, are not like
         | this. They are multi-user systems even when there's only ever
         | one breathing thing at the desk because it offers a degree of
         | separation between the privileges of your daemons, your pid 1,
         | your web browser etc.
        
           | heurisko wrote:
           | I think the point was that root shouldn't be required for
           | "common administrative tasks". The nuclear option of running
           | everything as root doesn't address this.
        
             | stingraycharles wrote:
             | What are the common administrative tasks related to
             | networking that require root for networking? All I can
             | think about is stuff like route tables and dhcp, both of
             | which live at the IP/Ethernet level rather than TCP.
        
               | adrian_b wrote:
               | Starting any network server process that uses ports under
               | 1000, like most standard protocols (https, http, ssh,
               | smtp, dns, ntp, dhcp etc.), requires root rights on any
               | UNIX-like operating system.
               | 
               | Most personal computers do not need server processes
               | (unless you want to connect remotely to them), but your
               | question was not restricted to them.
        
               | toast0 wrote:
               | You can adjust the highest priviledged port (at least on
               | FreeBSD). It's convenient to set that to 79 and let
               | regular users listen to http without needing root to
               | listen.
               | 
               | ssh and smtp generally need root to do their job,
               | although maybe you could find a way to deliver mail to
               | users without it. If you want to run user based dns or
               | others, you could set the priviledged port even lower.
        
               | boomlinde wrote:
               | From a practical point of view, regardless of the scope
               | of the original question, this is the kind of scenario
               | where you'd really want the restriction. More than a
               | simple administrative task it's a dangerous attack vector
               | to allow any user to launch your httpd or DNS.
               | 
               | That being said, check out capabilities(7) in Linux. You
               | can grant an executable the privilege of binding to a low
               | port when run by non-0 uid through setcap. This is a good
               | compromise.
        
               | convolvatron wrote:
               | this whole 'privledged ports' nonsense is left over from
               | a time where some process on another machine running on a
               | low port was somehow to be trusted - because the person
               | running that process was another administrator, and you
               | can generally trust those guys (as opposed to unwashed
               | users).
               | 
               | that world didn't last very long, and I wish we could
               | vent some of these designs that didn't pass the test of
               | time.
        
             | boomlinde wrote:
             | What the single user is called is a technicality. The
             | logical conclusion is the same: your login account has
             | administrative privileges and processes run by that account
             | have administrative privileges as a consequence.
             | 
             | The point I'm getting at isn't to promote the nuclear
             | option, but suggest that maybe there's a good reason for
             | e.g. a web browser or your word processor to not have the
             | same privileges as a user who can execute "simple
             | administrative tasks" like changing the TCP/IP stack
             | through which all your network traffic passes.
        
       | eatonphil wrote:
       | I couldn't make it past serialization/deserialization logic in my
       | own hobbiest TCP/IP stack. Even that part was super buggy. Next
       | time around I'm definitely going to be unit testing more parts
       | otherwise it's too hard for a beginner to get the easy parts
       | right let alone the harder parts.
       | 
       | Also, take a look at gvisor's network stack. It's definitely unit
       | tested.
       | 
       | https://github.com/google/gvisor/tree/master/pkg/tcpip/link/...
       | (an example)
        
         | amscanne wrote:
         | This is perhaps a better example:
         | https://github.com/google/gvisor/blob/master/pkg/tcpip/trans...
         | 
         | Also, some networking tests use separate frameworks (which look
         | more like the setup the original post is describing, since
         | those are needed also), e.g.:
         | https://github.com/google/gvisor/tree/master/test/packetimpa...
        
       ___________________________________________________________________
       (page generated 2021-08-28 23:02 UTC)