[HN Gopher] Unit testing a TCP stack (2015)
___________________________________________________________________
Unit testing a TCP stack (2015)
Author : PuercoPop
Score : 120 points
Date : 2021-08-26 06:15 UTC (2 days ago)
(HTM) web link (www.snellman.net)
(TXT) w3m dump (www.snellman.net)
| gravypod wrote:
| There's a lot of push back from engineers - especially people at
| lower levels of the stack - against testing infrastructure. One
| particularly famous example is Linux. Rather than testing before
| merging in code, they merge in code and then test the release
| candidate as a whole. It also seems game developers are extremely
| against automated testing frameworks as a whole. I've heard many
| times that it would be impossible to develop an enemy AI in a
| test-driven way (I did this for a senior project in college -
| finished the AI before the game was able to even start testing it
| [0]).
|
| I wonder what would need to happen to convince people that:
|
| 1. Even if you do something extremely low level, you can draw a
| distinction between your hardware and the interface that 99% of
| your software runs at.
|
| 2. You can develop complex behaviors iteratively with automated
| testing just like you can develop complex programs iteratively
| (tests are just programs).
|
| [0] - https://github.com/gravypod/it491-disabler-ai
| axguscbklp wrote:
| Or maybe those people know what they are talking about and the
| truth is that many people who are big fans of automated testing
| tend to overrate its value when it comes to many areas of
| software development. Testing takes effort and makes it harder
| to change things. In an ideal world where testing came for free
| then sure, more testing would be better. In the real world
| there are tradeoffs. If I am writing code that controls a
| spaceship then it makes sense to spend a huge amount of effort
| on testing. On the other hand, if I am adding a feature to a
| web application then in my personal experience, most of the
| time adding automated testing is a waste of effort.
| gravypod wrote:
| > Or maybe those people know what they are talking about and
| the truth is that many people who are big fans of automated
| testing tend to overrate its value when it comes to many
| areas of software development.
|
| There's been a lot of research, and internal studies, done at
| many companies that show pretty impressive benefits.
|
| When really questioned most engineers just say "I know my
| code works" or "I test my code, I don't need automated
| tests". That's the mentality I just don't understand.
|
| > Testing takes effort and makes it harder to change things.
|
| If it "makes things hard to change" just delete the test?
| You'll still get the benefit of knowing XYZ are
| broken/altered. You can also automate end-to-end and black
| box tests which should absolutely not require any
| modification if you're just refactoring.
|
| > If I am writing code that controls a spaceship then it
| makes sense to spend a huge amount of effort on testing. On
| the other hand, if I am adding a feature to a web application
| then in my personal experience, most of the time adding
| automated testing is a waste of effort.
|
| If you are working something that is allowed to fail, then
| sure, you don't really need to care about what practices you
| do. It's a very end-all-be-all argument to say "it's ok for
| my things to break". That argument goes just the same for all
| of these things:
|
| "Why do I need a version control system? It's fine if I
| manually merge my code incorrectly"
|
| "Why do I need a build system? It's fine if I forget to
| recompile one of my files"
|
| etc.
|
| In addition: the "argument" for automated testing isn't that
| it will just prevent you from breaking something. It's that
| it lets you know when things change and makes it easy to
| update your code without manually checking if things are
| broken. Recently, when adding features to our frontend, I
| just run our tests and update a png file in our repo. I then
| play around until my styling is how I like it. It's
| completely automated and saves me a lot of time. It also lets
| others know immediately when their CSS change will effect, or
| will not effect, my components.
| axguscbklp wrote:
| >There's been a lot of research, and internal studies, done
| at many companies that show pretty impressive benefits.
|
| I would need to look at the research and studies to see
| whether I actually believe them. I have seen how
| politicized and tribal technical decisions can become at a
| company and can easily imagine that there might be
| confounding variables.
|
| >When really questioned most engineers just say "I know my
| code works" or "I test my code, I don't need automated
| tests". That's the mentality I just don't understand.
|
| If I do not write automated tests and then my code works
| fine in production 95% of the time, and out of the 5% of
| the time that it breaks, 99% of the time it is a problem
| that is easily fixed and causes no major problems, and even
| the few major problems are of the "lose a manageable amount
| of money" kind and not the "people get injured or killed"
| kind - meanwhile if, on the other hand, writing automated
| tests would add 50% more effort to my work - then the
| cost/benefit analysis might suggest that I should not write
| automated tests. Keep in mind that I could spend that 50%
| more effort instead doing things that will make my code
| less likely to break but that are not automated testing.
| "not adding automated testing" is not the same thing as
| "not doing anything that will make the code less likely to
| break".
|
| >If it "makes things hard to change" just delete the test?
| You'll still get the benefit of knowing XYZ are
| broken/altered.
|
| If I delete the test then I will have wasted part of the
| effort that went into writing it. Usually when I change
| code I already know that things will work differently
| afterward and that things might break, so this would not
| tell me anything I did not already know.
|
| >You can also automate end-to-end and black box tests which
| should absolutely not require any modification if you're
| just refactoring.
|
| Agreed - I am much more friendly towards end-to-end tests
| than towards unit tests. I still would not advocate them
| dogmatically, but I find them to be more useful than unit
| tests.
|
| >If you are working something that is allowed to fail, then
| sure, you don't really need to care about what practices
| you do. It's a very end-all-be-all argument to say "it's ok
| for my things to break". That argument goes just the same
| for all of these things:
|
| I think that you might be seeing things in too binary a
| way. The vast majority of software products are allowed to
| fail, but not all of them are allowed to fail to the same
| degree. The practices do matter and there is no one-size-
| fits-all approach to testing. The context matters. What
| rate of failure is acceptable such that to try to prevent
| failure beyond that rate would actually be
| counterproductive? What specific impacts would more vs.
| less testing have on the development process? How do the
| relevant pros and cons fit into the overall goal of the
| organization? Etc.
|
| I am not saying "it's ok for my things to break", nor are
| probably most people who question automated testing dogma
| saying "it's ok for my things to break". We are saying that
| there are tradeoffs. Sometimes adding more automated
| testing does not actually add value. Again, it depends on
| the exact context.
|
| Regarding the rest of what you wrote: I am not disputing
| that automated testing can bring lots of benefits. I am
| just saying that I think some people are too dogmatic about
| it, see it too much as a magic pill, push for its use too
| broadly, and do not take the relevant tradeoffs
| sufficiently into account.
| lloydatkinson wrote:
| I agree - it seems it's only excuses that keep certain devs
| away from testing. Even Factorio manages testing, though its
| more of an integration test. I'm sure it could be done with
| unit tests too.
| gravypod wrote:
| I think even integration tests would be a huge improvement
| for most teams and software. As long as some automated checks
| are run before code is merged you're going to save yourself a
| lot of heart ache.
| PezzaDev wrote:
| Automated testing is hard to justify for games because you are
| not simply finishing features and moving on. You are constantly
| experimenting. Throwing out features and ideas and reworking
| them is part of the creative process. Automated testing adds
| overhead to iteration and isn't free, so you have to be very
| selective. At the end of the day it is more important for the
| game to be fun than it is stable.
| jsnell wrote:
| Though there are games where the fun has been proven, and
| it's important to be able to iterate fast without breaking
| what's already there. There's a great 2016 blog post from
| Riot on how they test LoL:
|
| https://technology.riotgames.com/news/automated-testing-
| leag...
| [deleted]
| foxfluff wrote:
| > I wonder what would need to happen to convince people that ..
|
| 3. It's worth it.
|
| I work at (relatively) low levels, and I would absolutely love
| to have extensive tests (plus more, e.g. TLA+ models to prove
| critical properties of the systems I work on).
|
| The pushback comes from stakeholders. They don't want to invest
| time and money into automated testing.
|
| And when no automated testing has been done yet, you can guess
| that the system hasn't been architected to be easily testable.
| Figuring out how to add useful tests without massive (time-
| consuming and expensive, potentially error-prone) re-
| architecting is also something that requires quite a bit of
| investment.
|
| Of course a part of is just lack of experience. If someone who
| knows how it's done could lead by example and show the ropes,
| that'd probably help. Getting the framework off the ground
| could be the key to sneaking in some tests in the future, even
| when nobody asks for them.
| rad_gruchalski wrote:
| I did tests for something like this once. Not low level, more
| of a set of 30+ microservices but the concept is there. The
| black box testing. This was for a smart home solution based
| on RabbitMQ. The client wanted to replace RabbitMQ with Kafka
| but they were anxious because there was no way to verify that
| the replacement would behave the same way.
|
| So we have spent 2 months writing black box tests against the
| RabbitMQ version, swapped it out with Kafka and fixed all
| issues within a couple of weeks.
|
| Since then, I believe that the integration tests are so much
| more valuable than unit tests.
| jsnell wrote:
| One benefit we discovered with this test framework after the blog
| post was written was that it made it much more convenient to do
| fuzzing and differential testing of the TCP stack. The core
| problem with fuzzing TCP is that there's a lot of incrementally
| built up state, and everything is extremely timing-dependent.
|
| You basically need the fuzzer to have a model of TCP state so
| that it can effectively explore the state space, which is quite
| complicated and not something you can do with off-the shelf
| tools.
|
| But once you have a bunch of unit tests designed to put the TCP
| stack into a specific state + a way of saving and restoring that
| state, it's really easy to just have snapshot of interesting
| situations where you can run a fuzzer on the next packet to be
| transmitted and see what happens.
| 10000truths wrote:
| It would be nice to have a bring-your-own-I/O TCP stack library
| that *doesn't* rely on custom callbacks - something like BearSSL
| but for TCP, where the stack is just a pure state machine object
| and the user is responsible for explicitly shunting packets to
| and from the state machine, retaining control over when and how
| the I/O is done. Instead of having to define callbacks for
| retrieving time and consuming packets, why not explicitly pass
| the timestamp and packet data to a state machine object via a
| direct function call?
| Slix wrote:
| Is Cloudflare's Quiche QUIC library
| https://github.com/cloudflare/quiche similar to what you're
| looking for? All I/O must be done by the caller.
| 10000truths wrote:
| Yeah, that's the general idea. Essentially a state machine
| with a send queue and receive queue, and four operations:
|
| - Input received raw data
|
| - Output received application data
|
| - Input application data to send
|
| - Output raw data to send
|
| Obviously, since TCP connection state is time sensitive, the
| "raw data" wouldn't just be the IP packet and headers, but
| also a time stamp telling the state machine when that packet
| was received/sent. If you want the state machine to keep
| track of time even when no packets are being received or
| sent, there could be an additional operation just to input a
| timestamp without additional packets. In effect, time is just
| another input that the user is responsible for feeding to the
| state machine at sufficiently fine intervals.
|
| In practice, you could emulate this pattern with a callback-
| oriented protocol stack by populating an in-memory
| send/receive queue in your callback function, but that design
| can be somewhat inflexible because it forces potentially
| undesirable constraints, e.g. an extra memory copy that could
| otherwise be elided.
| cjfd wrote:
| Yes, a TCP stack certainly is complex enough to warrant serious
| automated testing and/or TDD.
|
| The idea of putting the TCP stack in user space is interesting.
| If one actually could map the memory of the whole device into
| user space one could maybe have fewer system calls and therefore
| have better performance.
|
| Also, what I find somewhat irritating about using a linux system
| is how often one needs to run commands as root (sudo) for common
| administrative tasks like mounting a disk or stuff like that.
| Having a user space TCP stack could also decrease the need for
| that as far as setting up the network is concerned. If the linux
| machine is single user, as most of them are nowadays, it makes
| more sense that way, I think.
| avinassh wrote:
| > The idea of putting the TCP stack in user space is
| interesting.
|
| Indeed! Julia Evans wrote a really nice post explaining the
| usecases and benefits - https://jvns.ca/blog/2016/06/30/why-do-
| we-use-the-linux-kern...
| rmetzler wrote:
| > one needs to run commands as root (sudo) for common
| administrative tasks like mounting a disk
|
| I would think if you don't do this, an attacker who is able to
| execute code but is non-root yet could easily elevate
| permissions by shadowing legitimate pathes and trick root into
| executing untrusted code.
|
| I'm not a security engineer and just find it interesting, so if
| my thinking is off, please correct me.
| stefan_ wrote:
| The whole "map the PCIE device into userspace process memory"
| thing is called DPDK (https://www.dpdk.org/)
| ay wrote:
| And you can combine the two:
|
| https://fd.io/docs/vpp/master/whatisvpp/hoststack.html
|
| And there is a sister project using this tech to get
| noticeable speed-ups:
|
| https://wiki.fd.io/view/VSAP
|
| Disclaimer: I am involved with the VPP project.
| stingraycharles wrote:
| > Having a user space TCP stack could also decrease the need
| for [root privileges] as far as setting up the network is
| concerned.
|
| I think it's important to distinguish between the protocol
| (TCP) and the hardware device. You would still absolutely need
| to talk to the device, it's just that moving a lot of the logic
| to user space means much less context switching for system
| calls for the application.
|
| I can imagine on Linux you can talk directly to /dev/eth0 if
| you would want to (in the same way that you can talk to
| /dev/sda), and then you would be back at square one regarding
| root privileges.
| monocasa wrote:
| > I can imagine on Linux you can talk directly to /dev/eth0
| if you would want to (in the same way that you can talk to
| /dev/sda), and then you would be back at square one regarding
| root privileges.
|
| It's a AF_PACKET, SOCK_RAW socket rather than a device file,
| but yes.
| yakubin wrote:
| You don't need to be root to mount disks, when you have udisks
| installed (which would be almost all distros by default). See
| _udisksctl(1)_ : <https://manpages.debian.org/buster/udisks2/ud
| isksctl.1.en.ht...>
| boomlinde wrote:
| There's nothing inherent about Linux which prevents you from
| running everything as uid 0. If you're fine with every process
| you run having the same full privileges and shared ownership of
| everything, you should.
|
| Most machines, at least outside embedded devices, are not like
| this. They are multi-user systems even when there's only ever
| one breathing thing at the desk because it offers a degree of
| separation between the privileges of your daemons, your pid 1,
| your web browser etc.
| heurisko wrote:
| I think the point was that root shouldn't be required for
| "common administrative tasks". The nuclear option of running
| everything as root doesn't address this.
| stingraycharles wrote:
| What are the common administrative tasks related to
| networking that require root for networking? All I can
| think about is stuff like route tables and dhcp, both of
| which live at the IP/Ethernet level rather than TCP.
| adrian_b wrote:
| Starting any network server process that uses ports under
| 1000, like most standard protocols (https, http, ssh,
| smtp, dns, ntp, dhcp etc.), requires root rights on any
| UNIX-like operating system.
|
| Most personal computers do not need server processes
| (unless you want to connect remotely to them), but your
| question was not restricted to them.
| toast0 wrote:
| You can adjust the highest priviledged port (at least on
| FreeBSD). It's convenient to set that to 79 and let
| regular users listen to http without needing root to
| listen.
|
| ssh and smtp generally need root to do their job,
| although maybe you could find a way to deliver mail to
| users without it. If you want to run user based dns or
| others, you could set the priviledged port even lower.
| boomlinde wrote:
| From a practical point of view, regardless of the scope
| of the original question, this is the kind of scenario
| where you'd really want the restriction. More than a
| simple administrative task it's a dangerous attack vector
| to allow any user to launch your httpd or DNS.
|
| That being said, check out capabilities(7) in Linux. You
| can grant an executable the privilege of binding to a low
| port when run by non-0 uid through setcap. This is a good
| compromise.
| convolvatron wrote:
| this whole 'privledged ports' nonsense is left over from
| a time where some process on another machine running on a
| low port was somehow to be trusted - because the person
| running that process was another administrator, and you
| can generally trust those guys (as opposed to unwashed
| users).
|
| that world didn't last very long, and I wish we could
| vent some of these designs that didn't pass the test of
| time.
| boomlinde wrote:
| What the single user is called is a technicality. The
| logical conclusion is the same: your login account has
| administrative privileges and processes run by that account
| have administrative privileges as a consequence.
|
| The point I'm getting at isn't to promote the nuclear
| option, but suggest that maybe there's a good reason for
| e.g. a web browser or your word processor to not have the
| same privileges as a user who can execute "simple
| administrative tasks" like changing the TCP/IP stack
| through which all your network traffic passes.
| eatonphil wrote:
| I couldn't make it past serialization/deserialization logic in my
| own hobbiest TCP/IP stack. Even that part was super buggy. Next
| time around I'm definitely going to be unit testing more parts
| otherwise it's too hard for a beginner to get the easy parts
| right let alone the harder parts.
|
| Also, take a look at gvisor's network stack. It's definitely unit
| tested.
|
| https://github.com/google/gvisor/tree/master/pkg/tcpip/link/...
| (an example)
| amscanne wrote:
| This is perhaps a better example:
| https://github.com/google/gvisor/blob/master/pkg/tcpip/trans...
|
| Also, some networking tests use separate frameworks (which look
| more like the setup the original post is describing, since
| those are needed also), e.g.:
| https://github.com/google/gvisor/tree/master/test/packetimpa...
___________________________________________________________________
(page generated 2021-08-28 23:02 UTC)