[HN Gopher] Sysadmin friendly high speed Ethernet switching
       ___________________________________________________________________
        
       Sysadmin friendly high speed Ethernet switching
        
       Author : todsacerdoti
       Score  : 210 points
       Date   : 2024-04-24 08:32 UTC (14 hours ago)
        
 (HTM) web link (blog.benjojo.co.uk)
 (TXT) w3m dump (blog.benjojo.co.uk)
        
       | redleader55 wrote:
       | I'm curious what did this guy connect to several 100 GbE ports
       | and how does the upstream connections he mentions look like, and
       | from which provider. The device is second hand, so the likely
       | use-case is a Home(?)Lab type of setup.
        
         | benjojo12 wrote:
         | Hi, I am the guy.
         | 
         | > what did this guy connect to several 100 GbE
         | 
         | They have 100GBASE-PLR4 optics in them, that allow the 100G
         | ports to be split up into 4x25G ports (or actually in this
         | switch, 2x25G ports due to a hardware limitation with this
         | switch)
         | 
         | > how does the upstream connections he mentions look like, and
         | from which provider
         | 
         | They are just normal 10G-LR Single mode optics, in a data
         | center.
         | 
         | > The device is second hand, so the likely use-case is a
         | Home(?)Lab type of setup.
         | 
         | Nope, this device now runs my business bgp.tools
        
           | redleader55 wrote:
           | That suddenly makes a lot more sense! Thank you for the
           | explanations and for the writeup!
        
           | ComputerGuru wrote:
           | Stupid question, why does the datacenter provide you with LR
           | smf instead of SR mmf?
        
             | benjojo12 wrote:
             | In my eyes, MultiModeFibre is on it's way out.
             | 
             | The high speed (25G+) does not have good solutions for
             | multimode, and the length limits that physics enforces with
             | multimode mean that it's not "no-brainer" applicable for
             | going any more than between the same rack row.
             | 
             | So if you are dealing with datacenter cross connects that
             | can exceed the max distance for MultiMode then you might
             | spend hours debugging broken stuff for no real gain.
             | MultiMode is slightly cheaper, but it's a false economy the
             | moment stuff does not work correctly. I've spoken to people
             | with DC cross connects that go into the 5km+ of cable
             | distance. So it's easier to just stock one kind of optic
             | per speed, and call it a day.
             | 
             | Equinix I think already phased out MMF XCs
        
               | ComputerGuru wrote:
               | Thanks for the info.
        
               | tssva wrote:
               | I stopped dealing with large data center providers in
               | 2014. By that point we had switched to using single mode
               | within our colocation spaces and most of the colocation
               | data center providers we dealt with had stopped providing
               | any multimode cross connects prior to that.
        
             | karma_pharmer wrote:
             | Friends don't let friends use multi-mode fiber.
             | 
             | Single-mode fiber is future-proof. Utilities bury that
             | stuff and depreciate it over a 30-year lifetime. There has
             | been like one spec change since the 1970s.
             | 
             | Single-mode fiber. Always single-mode fiber. Nothing else,
             | ever.
        
             | gorkish wrote:
             | Stupid answer: Multimode sucks. The only reason to use it
             | is if the person before you failed to know that it sucks
             | and now it cannot be replaced.
        
             | somat wrote:
             | For them unfamiliar with fiber optics. Single mode fiber is
             | better than multi mode in nearly every category. The only
             | advantage multimode has is when terminating it. The looser
             | tolerances and large fibers make it easier to attach an
             | end.
        
       | PreInternet01 wrote:
       | Yeah, the whole "(hyper)converged" network gear experience has
       | been... let's say less than overwhelming. I'm happy that this
       | person managed to get an experience they're happy with, but keep
       | in mind that:
       | 
       | -The SN2010 retailed for about 10K (Euro/Dollars)
       | 
       | -It has never been truly available, as in: you could go
       | somewhere, order it, and expect it to turn up in 2-3 days
       | 
       | -Even though small, these units tend to be _loud_ , with at least
       | 2 tiny fans making _a lot_ of high-pitched noise
       | 
       | But the most salient point is that, even with "Linux on a
       | router/switch", there's no guarantee that you'll get decent
       | performance, as that _entirely_ depends on how well the kernel
       | understands the (proprietary) onboard chipset, which usually
       | means that you 're squarely in "well, here's this blob that works
       | on _certain_ kernel versions, and good luck with that! "
       | territory.
        
         | mattpallissard wrote:
         | Hyperconverged infra is basically bundled networking, storage,
         | and hypervisor. Think nutanix, vxrail, Cisco ucs.
        
         | benjojo12 wrote:
         | > But the most salient point is that, even with "Linux on a
         | router/switch", there's no guarantee that you'll get decent
         | performance
         | 
         | As long as the ASIC is programmed the performance is the same
         | between kernel versions. If the ASIC does not get programmed
         | then it's like the entire device does not work at all (so you
         | likely need to roll back the update you made).
         | 
         | > which usually means that you're squarely in "well, here's
         | this blob that works on certain kernel versions, and good luck
         | with that!" territory.
         | 
         | As mentioned in the post, the switch in question has blobless
         | drivers, unlike the broadcom stuff you likely have experience
         | with based on what you are saying
        
         | wmf wrote:
         | You're FUDding yourself pretty hard. All of these points have
         | been addressed in various comments in this thread.
        
       | jacknews wrote:
       | Woah these are around $5k.
       | 
       | Not sure that's a fit for home network enthusiasts, and for
       | business, just pay the bucks to be able to pass the buck - ie get
       | a packaged solution not homebrew.
        
         | benjojo12 wrote:
         | > Woah these are around $5k.
         | 
         | For what it's worth I did not pay 5k USD for it, that is
         | significantly overpriced for it on the 2nd hand market.
         | 
         | > for business, just pay the bucks to be able to pass the buck
         | - ie get a packaged solution not homebrew
         | 
         | There is a significant cost difference here,
         | 
         | The overall point of the post is that the "packaged solution"
         | is not viable when most switch vendors have what I can describe
         | as "crap" software quality and/or support.
         | 
         | So if you are in the land of simple L2 switching and L3
         | routing, this switch is amazing because you can escape the crap
         | vendor software.
        
           | hinkley wrote:
           | Sort of an openwrt for hard lines.
        
         | bradfa wrote:
         | eBay shows me a whole bunch of these SN2010-series switches in
         | the $1000-2000 USD range. Totally reasonable for such a switch.
         | You can step up to the SN2700 mentioned in the article for
         | prices in the $2000-3000 USD range. All used from sellers
         | claiming "tested".
         | 
         | If you're really into homelab stuff or trying to run a small
         | business where networking is critical, they're quite a good
         | deal.
         | 
         | The "packaged solutions" as a new product from a known vendor
         | with a support contract (because honestly if you're not buying
         | the support contract you're going to spend a LOT of time
         | messing with the switch no matter what vintage it is) are at
         | least 10x the price, which very well could be much too much for
         | a home user or small business.
        
         | candiddevmike wrote:
         | I've seen a few chinese knock off switches on Amazon and
         | Alibaba that are cheap and layer 3/"managed". I messaged a few
         | of the alibaba sellers telling them I'm "building a custom
         | switch OS and want their hardware to prototype" and they seemed
         | amicable to helping me get a custom firmware installed, but
         | never pursued it more.
         | 
         | I'd love to have managed 2.5Gb switches with 10Gb uplinks in my
         | house using a custom linux OS that I can use standard config
         | management tools with...
        
           | zokier wrote:
           | > I'd love to have managed 2.5Gb switches with 10Gb uplinks
           | in my house using a custom linux OS that I can use standard
           | config management tools with
           | 
           | Me too. I have been eyeballing "SparX-5" based switches,
           | which do run Linux ("SMBStaX"), but you'd need something like
           | million bucks to get anywhere :( Could be candidate for a
           | Kickstarter style project maybe...
        
             | tambre wrote:
             | Checkout RCHD-SPARX [0]. You'll need to draw the rest of
             | the owl, but the really hard part's there. I hope someone
             | builds a switch on that.
             | 
             | [0] https://conclusive.tech/products/rchd-sparx-networking-
             | som/
        
           | wolrah wrote:
           | > I'd love to have managed 2.5Gb switches with 10Gb uplinks
           | in my house using a custom linux OS that I can use standard
           | config management tools with...
           | 
           | 100%, this has been the most frustrating thing about
           | following the various "open" switching worlds. There's a
           | massive gap in the middle between the sorts of 4-8 port
           | switches that end up in OpenWRT-compatible routers and this
           | sort of enterprise switch that's barely accessible to the
           | "homelab" class user.
           | 
           | I would absolutely love to have some open switching in the
           | "Ubiquiti" class, desktop and 1U rackmount devices with
           | gigabit through 10G as their primary interfaces. I'm
           | personally in the VoIP world and if I could install Asterisk
           | directly on a 48 port PoE switch I'd be deploying them by the
           | dozens.
        
           | wmf wrote:
           | _I 'd love to have managed 2.5Gb switches with 10Gb uplinks
           | in my house using a custom linux OS that I can use standard
           | config management tools with... _
           | 
           | Check out DENT NOS: https://dent.dev/ There's a Delta 32x1G
           | (PoE+) + 16x2.5G (PoE++) + 6x25G SFP28 switch that can run
           | DENT.
        
             | candiddevmike wrote:
             | Those Delta ones look nice but I can't find a reseller for
             | them unfortunately (specifically the DVS-G106W02-2GF)
        
       | greyface- wrote:
       | Really tempting price/performance, and I love the idea of getting
       | away from questionable switch vendor OSes.
       | 
       | How disruptive are switch config changes? If I edit
       | /etc/network/interfaces to add a new port, vlan, etc - does a
       | `systemctl restart networking` (or whatever equivalent) bounce
       | ports or halt switching for a moment while changes are applied?
        
         | treffer wrote:
         | It just depends on what you use for management.
         | 
         | IIRC the /etc/network/interfaces does a reconfiguration that's
         | pretty disruptive.
         | 
         | Things like brctl and ethtool worked on the fly without issues
         | (note though that I mostly used Arista years ago).
         | 
         | It is usually non-disruptive if it gets applied as deltas. If
         | your config tool does a teardown/recreate then that's
         | disruptive. Within the bounds of ethernet and routing protocols
         | (OSPF DR/DBR changes are disruptive, STP can be fun, ....).
        
         | benjojo12 wrote:
         | > How disruptive are switch config changes? If I edit
         | /etc/network/interfaces to add a new port, vlan, etc
         | 
         | They are seamless as long as your configuration does not do
         | something stupid like tear down interfaces to reconfigure them.
         | The switch takes no noticeable time to program from the Linux
         | space to the ASIC
         | 
         | > - does a `systemctl restart networking` (or whatever
         | equivalent) bounce ports or halt switching for a moment while
         | changes are applied?
         | 
         | "systemctl restart networking" will typically blow away even
         | the well most well configured systems in my experience.
         | 
         | In the post I suggest using ifupdown (the first one), since
         | it's the most "easy" to debug, but I'm sure networkctl works
         | too, with a healthy amount of systemd restraint
        
       | bitbckt wrote:
       | I have an SN2700 in my rack, next to a pair of Arista 7060CXs (as
       | a point of comparison). These are wildly under-rated devices
       | outside of the STH fanbase.
       | 
       | You may be surprised at how quiet and low power these Mellanox
       | switches can be.
        
         | benjojo12 wrote:
         | Quiet in the terms of DC grade switching, I'm not sure you
         | could get away with such a switch in a home environment. The
         | 100G optics need plenty of airflow to keep cool so it's not
         | just a case of swapping the fans for something smaller
        
           | bitbckt wrote:
           | The rack I'm referring to is in my home. YMMV, of course.
        
           | dgacmu wrote:
           | You can use a 100g DAC within a single rack (which is also
           | cheaper). I only have a tiny bit of 100g at home and just do
           | 10g optics for the connection out of the rack. Of course,
           | that's my weird setup and it won't work once I want 100g to
           | my office.
        
       | ComputerGuru wrote:
       | This is really, really cool and I didn't know the platform was
       | open to the extent that you could install your own upstream Linux
       | and just get going.
       | 
       | I am curious though what configuration option prevents this from
       | ending up with software switching. I understand the mellanox
       | kernel module was compiled and loaded, but certainly that doesn't
       | mean that _anything_ you do in the network stack gets converted
       | to switch fabric code and uses hardware packet switching. How do
       | you make sure that you don 't errantly wind up with poor latency
       | and capped throughput?
       | 
       | But also, short of making and selling your own networked device
       | for whatever reason, what are the real benefits of going this
       | approach? I can see crazy use cases for where you have full
       | control of the network stack (but again, see my point above --
       | how do you guarantee you are not doing this in software?) but for
       | _most_ purposes, especially with qsfp and fiber, how much are you
       | really gaining by doing it on-device? What is the killer use case
       | here?
       | 
       | EDIT
       | 
       | Upon rereading, it seems the switch is hard-coded to be hardware-
       | switched and cannot end up in a situation where you are
       | accidentally using software packet switching in the first place
       | (i.e. it does not just optimize to a hardware packet switching
       | state). But that limits what you can do considerably, to the
       | point that an off-the-rack Juniper or CSCO or whatever probably
       | has more features than you can do here without writing your own
       | code to hook into the mellanox sdk?
        
         | benjojo12 wrote:
         | > But that limits what you can do considerably, to the point
         | that an off-the-rack Juniper or CSCO or whatever probably has
         | more features than you can do here without writing your own
         | code to hook into the mellanox sdk?
         | 
         | I mean, I'm not touching any mellanox sdk here, I am using the
         | a very similar stack that someone on a "software router" would
         | use, on a switch that can automatically accelerate it to 800G+
         | throughputs, while hitting a 60W power target.
         | 
         | You can hit some of those performance/power numbers in vendor
         | hardware like Juniper/Cisco/Arista, however you have to also
         | put up with their software, I (and others in my group of peers)
         | have not had great experiences with vendor software, and in
         | this setup I am able to patch/fix the software on my own terms.
         | 
         | If there is a security vuln in one section, I can fix that, and
         | call it a day, I won't be forced to upgrade parts of the system
         | I do not want to. I cannot do this with Juniper/Cisco/Arista
         | always.
        
           | hamandcheese wrote:
           | Is it obvious when you try to use a config that won't be
           | accelerated? Or is the config silently ignored?
        
             | benjojo12 wrote:
             | It really depends on how much you know what you are doing,
             | If you stick to:
             | 
             | *) IP Routing that would normally "fit" in a vendor switch
             | 
             | *) Bridging
             | 
             | *) VRFs
             | 
             | You will be fine
             | 
             | If you try and do some weird stuff then it's best to check
             | with "ip route" to see if it was actually installed into
             | hardware or not, but I would simply not do anything weird
             | on such hardware
        
             | karma_pharmer wrote:
             | Yes. The switchdev "sw1p[0-9]+" ports are special; the any
             | data the software kernel injects to them is discarded and
             | they never emit packets to the kernel. They exist only to
             | allow you to use `ip bridge` and `ip route` on them. So if
             | you accidentally configure software switching on these
             | ports no data will flow -- it will be totally obvious. You
             | might get "no packets" by accident but you will never get
             | "software switching" by accident.
             | 
             | If you really _want_ software switching you have to use the
             | management port (there 's only one or two of these) whose
             | name is "eth0" or "eth1" or something like that. So
             | avoiding "accidental software switching" is really easy --
             | if you're typing "eth" you're doing it wrong. You can even
             | explicitly delete this interface if you don't need the CPU
             | to be able to snoop/inject traffic to/from the switch
             | ports.
        
         | RiverCrochet wrote:
         | To configure Linux to do software switching, you need bridge
         | interfaces and NICs to be made part of the bridge with the
         | appropriate `ip` commands (or `brctl` if you're still using
         | that).
         | 
         | This is common with small home routers - the WLAN and wired LAN
         | ports (all typically appearing as one NIC) will be made part of
         | a bridge `br0`. The four LAN ports aren't typically exposed as
         | separate NICs so there is hardware switching going on there
         | (some devices do let you split them out because they are VLANed
         | internally though).
         | 
         | If `ip link show type bridges` doesn't show any bridges then
         | you aren't software switching unless your drivers are lying to
         | you.
        
           | wmf wrote:
           | None of that applies to switchdev; it's a somewhat different
           | world than normal Linux networking.
        
         | karma_pharmer wrote:
         | _I am curious though what configuration option prevents this
         | from ending up with software switching_
         | 
         | The answer is "switchdev":
         | 
         | https://www.kernel.org/doc/html/latest/networking/switchdev....
         | 
         | The Linux switchdev driver is the awesome magic that says "make
         | hardware-offloaded switching ASICs look just like software
         | switching". It's beautiful and amazing, as you'd expect from
         | Mellanox.
        
       | p1esk wrote:
       | The author mentioned another similar write up which provided more
       | useful explanations and context to me:
       | 
       | https://ipng.ch/s/articles/2023/11/11/mellanox-sn2700.html
        
       | neilv wrote:
       | I like the photo of smuggling gear out of the data center in a
       | backpack.
        
       | logifail wrote:
       | I have two Mikrotik CRS305-1G-4S+IN (10GbE) plus one CRS504-4XQ-
       | IN (100GbE) switches in my office. Along with a load of other
       | Mikrotik gear.
       | 
       | The 10GbE ones are silent, and the 100GbE one isn't exactly loud,
       | unlike lots of second-hand kit which has come from a DC...
        
         | eqvinox wrote:
         | Benjojo wasn't trying to make something for home or office (or
         | even lab) use, he runs DC installations. The point of the
         | article was to have an as-open-as-possible switch. Mikrotik
         | doesn't qualify for that.
        
       | jmbwell wrote:
       | I want this and I want to put VyOS on it.
        
         | cvalka wrote:
         | VyOS is a great piece of software. I wish more people knew
         | about it!
        
         | gorkish wrote:
         | Having experience with both; I am confident this would not work
         | as well as you would want it to if even at all. VyOS has no
         | native awareness of switchdev interfaces or their limitations
        
         | pa7ch wrote:
         | I wish the edgerouter line by ubiquiti continued as another way
         | to use vyatta.
        
       | wolverine876 wrote:
       | > as close to stock Debian as possible
       | 
       | Is there an OS or another Linux distribution that matches
       | Debian's performance in this respect, without the complexity of
       | an entire Linux system? Could Debian be stripped down (and then
       | how are updates applied)?
        
         | benjojo12 wrote:
         | Sure you can install basically any EFI+amd64 distro you want,
         | that also has the correct kernel module.
         | 
         | I don't really know why you would want do that (though, RHEL
         | based would be a reasonable 2nd option)
         | 
         | Performance of the distro isnt really a big deal because the OS
         | has so little to do with the day to day packet shuffling, My
         | switch runs at a load average (crappy metric I know, but to
         | give you a idea) of 0.04.
         | 
         | I apply updates with apt update and apt upgrade, as you would a
         | normal Debian server. Only the kernel is pinned since it is
         | special for the switch, however you can rebuild the kernel
         | deb's when you want, as the driver is in the mainline kernel
         | repo.
        
         | gamepsys wrote:
         | I think you are misunderstanding what the author means by
         | "stock Debian." The issue here is that the hardware is so
         | unique that we need to add kernel modules that are not included
         | in the standard Linux kernel that ships with Debian. So we need
         | a custom Linux kernel. Close to stock might also be referring
         | to any config changes the author needed to make in order for
         | the switch to function properly.
         | 
         | This could be done with any Linux. I'm not sure what
         | 'complexity' you are referring to, but the complex open modular
         | nature of the Linux kernel allows us the flexibility to build a
         | custom kernel for this interesting application without
         | unnecessary modules. To be frank all modern operating systems
         | are complex because modern hardware is complex and user
         | expectations are high. Some just do a great job of hiding the
         | complexity from the user. Linux makes an attempt to expose it's
         | complexity, but it's complexity is about the same as iOS,
         | macOS, Windows, etc.
         | 
         | If you wanted to do without Debian or any distribution you
         | could compile your own kernel, your own user space tools, and
         | then install it on the hardware. At this point updates would
         | need to be applied by pulling code from upstream and compiling
         | it yourself. Look into Linux From Scratch to get an idea of how
         | Linux without a distribution works. However, I don't think the
         | juice is worth the squeeze in this situation.
         | 
         | EDIT: My answer to "is there another OS", to which the answer
         | is yeah, probably a dozen. I think a blog post of doing this
         | with openBSD would be interesting because I'm not sure what the
         | exact steps are to install custom drivers on a BSD and openBSD
         | has a very high standard for security. I think the reason it's
         | done on Linux is because there is a lot of expertise about
         | Linux and this type of project is relativity straightforward
         | with Linux.
        
           | wolverine876 wrote:
           | What I meant was simpler OSes like BSDs, as you discuss in
           | the edit, or Linux stripped down to networking essentials.
        
         | MisterTea wrote:
         | Plan 9. Seriously. One person can understand the whole OS
         | including the kernel. File servers coupled with 9p provide an
         | excellent abstraction for distributing services across
         | networks. Source code is included and the build system is
         | designed with cross platform baked in OUT OF THE BOX.
         | 
         | All you need is a driver for the switch chip that serves it as
         | a set of files you can read and write configuration commands
         | to. With the right driver setup you could then configure the
         | chip using human readable textual commands and use a script and
         | standard cmd line tools to configure the switch. We could PXE
         | boot plan 9 and leave the switch diskless. PXE booting plan 9
         | networks is brain dead simple and works out of the box.
        
           | wolverine876 wrote:
           | Interesting. What would it take to write the driver? Also, is
           | Plan 9 maintained sufficiently?
        
       | mschuster91 wrote:
       | > So, I was very happy to learn that a friend had a Mellanox
       | SN2010 that they were not using and were willing to sell to me.
       | 
       | That thing retails new for 10k. You got awesome friends if they
       | have such a thing lying around unused and willing to sell it to
       | you at a discount!
       | 
       | For those of us with less fortunate bank accounts: what's the
       | _smallest_ and reasonably affordable Mellanox model that has a
       | similar featureset in terms of native Linux support?
        
         | benjojo12 wrote:
         | > That thing retails new for 10k. You got awesome friends if
         | they have such a thing lying around unused and willing to sell
         | it to you at a discount!
         | 
         | Tactical eBay (or whatever is the robust 2nd handmarket in your
         | region) can yield similar discounts on such hardware. Retail
         | price is often list price, and that is often a large mark up
         | regardless. The 32x100G port version of the same switch goes
         | for around PS2000 in the UK 2nd hand market
         | 
         | > what's the smallest and reasonably affordable Mellanox model
         | that has a similar featureset in terms of native Linux support?
         | 
         | The SN2010 is likely the smallest, the SN2700 is likely the
         | cheapest
        
         | 0xbadcafebee wrote:
         | Wait a few years and then root around in the datacenter
         | dumpster once they toss the old gear after tech refresh
         | (between 3-10yrs depending on the gear)
        
           | mschuster91 wrote:
           | No such things around any more, unless you know people who're
           | willing to risk their jobs. Everything's gotta be
           | environmentally certified and whatnot... no one's putting
           | stuff in a dumpster, it all goes to some commercial reseller
           | that either refurbishes and sells on old equipment or hands
           | it to a certified scrapyard.
        
       | KaiserPro wrote:
       | I commend Ben for pushing the boundary of open Software networky-
       | type things.
       | 
       | Even though I have been running HPC clusters for years, I still
       | really don't like core switching infra[1]. Its just a pain in the
       | arse to investigate and run.
       | 
       | [1] I realise that this is very much not a core switch, or even a
       | TOR switch.
        
       | zylent wrote:
       | I encourage playing with these + cumulus linux via the nvidia air
       | lab environment. It may seem limiting at first, but being able to
       | upload a graphviz topology as part of a CI/CD pipeline is
       | extremely powerful.
       | 
       | See: https://github.com/na-son/nvidia-air for bootstrapping a
       | non-EVPN topology.
        
       | frzen wrote:
       | I really enjoyed that writeup. I wonder has anyone played with
       | DPUs in a similar way? I have been trying to think of how to
       | hairpin traffic through some standalone DPUs like an nvidia
       | bluefield or pensando. To make my own >100G east west stateful
       | firewall for a small fraction of the cost of a real option.
       | 
       | Switch like OPs with all traffic passing to and from a DPU to
       | make a poor man's Aruba CX10000
        
         | wmf wrote:
         | Bluefield can also run regular Linux distros and it can run
         | standalone without a host (but you have to power it somehow).
         | https://www.servethehome.com/zfs-without-a-server-using-the-...
         | 
         | All other DPUs seem to be NDA-encrusted.
        
       | MrBrobot wrote:
       | Cool project... for homelab stuff though, I lean towards simpler,
       | cheaper solutions.... Like Mikrotik hardware, and low power mini
       | PCs. I'm not hosting a business out of my house, it all functions
       | perfectly fine.
        
         | sophacles wrote:
         | Great, seems like a setup that works for you. I'm considering
         | going to find one of these switches for my lab - it'll be like
         | the 5th switch in my rack. I like playing with high performance
         | networking, and your mikrotik isn't going to cut it for the
         | stuff I like to experiment/play around with.
         | 
         | (Remember... this is a hobby where people persue their
         | interests and experiment with things they want to know more
         | about, not a proscribed set of exercises for just one path
         | towards devops engineer).
        
       | Asmod4n wrote:
       | Would buy something like this as a device you can connect via
       | pcie to your server.
       | 
       | No need to have switches and servers in your rack anymore, every
       | server is a switch and every switch is a server with a 192
       | threads CPU. Insane.
        
         | wmf wrote:
         | It ends up crazy expensive and complex to configure but go for
         | it.
        
           | Asmod4n wrote:
           | I bet Nvidia is selling something like that but with only 2-4
           | ports for the same price as a 48 port switch...
        
             | wmf wrote:
             | AFAIK DPUs are only ~$2,000 but if you put one in each
             | server it adds up to far more money than a traditional
             | network. They're really not intended to replace TOR
             | switches.
        
         | karma_pharmer wrote:
         | All the switches in this series _except_ the one in this review
         | actually have two separate boards inside with a PCIe-over-cable
         | connection between them. For example the SN2700 here (the
         | sleeved cable in the third photo; you can 't see the connector
         | from the angle it's taken from):
         | 
         | https://ipng.ch/s/articles/2023/11/11/mellanox-sn2700.html
         | 
         | The cable has a SAS (SFF-8087) connector on each end. I bet you
         | can replace it with an SFF-8087-to-Oculink cable:
         | 
         | https://www.amazon.com/chenyang-SFF-8611-SFF-8087-PCI-Expres...
         | 
         | and a PCIe-to-oculink card like one of these:
         | 
         | https://www.amazon.com/Ableconn-PEX-OL153-OCuLink-SFF-8612-A...
         | 
         | (none of the links are affiliate links)
        
       | chgs wrote:
       | No mention of ptp, I'm curious how the switch handles it (and how
       | is it connfiged from Linux)
       | 
       | I find understanding how Linux networking really helpful in
       | uunderstanding mikrotiks (which I use a lot in prod, although
       | tend to shy away from for the most critical and demanding of
       | services)
        
         | wmf wrote:
         | https://github.com/Mellanox/mlxsw/wiki/Precision-Time-Protoc...
        
         | gorkish wrote:
         | IIRC the half-width version of these switches are not
         | advertised as having PTP support. I wouldn't have the first
         | clue if it's straightforward to configure it to work with
         | upstream linux but I do use PTP with Dell SmartFabric OS10 so
         | there is likely some way to achieve it.
         | 
         | As impressive as it is, switchdev is a big departure from
         | "Linux networking" though. This is not a great platform if that
         | is your main objective. The interfaces are not "normal" in that
         | respect and huge subsystems of the Linux network stack do not
         | apply.
        
       | karma_pharmer wrote:
       | Having gone through this struggle myself, here's the cheat sheet.
       | You want a device that uses the Linux switchdev driver and is
       | supported by dentOS _whether or not you actually choose to run
       | dentOS on it_ (I run NixOS on my switch):
       | 
       | https://www.kernel.org/doc/html/latest/networking/switchdev....
       | 
       | https://github.com/dentproject/dentOS
       | 
       | Switchdev support means you don't need hardware-specific
       | userspace tools (with their own bizarre syntax to learn) in order
       | to configure the switch.
       | 
       | DentOS support means the device uses a sane bootloader (uboot or
       | grub) and the only binary blobs on the device will be the ones
       | built into the bootloader (IntelME, Arm Trusted Firmware) and the
       | switch firmware which will be part of linux-firmware (and
       | therefore very easy to manage/update).
       | 
       | In particular, looking for these two keywords is how you make
       | sure that the hardware vendor is staying on "their side of the
       | line" between hardware and software. Violations of this line are
       | endemic to 10G+ switching.
        
         | mongol wrote:
         | I had no idea there were switches you could run NixOS on. What
         | would be an example of such a switch?
        
       | m463 wrote:
       | cool "100000baseSR4"
       | 
       | I remember 10base*, so I find 100000base* pretty amazing.
        
       ___________________________________________________________________
       (page generated 2024-04-24 23:00 UTC)