hngopher.com

       [HN Gopher] Automate Your Network
       ___________________________________________________________________
        
       Automate Your Network
        
       Author : hjuutilainen
       Score  : 104 points
       Date   : 2023-07-03 15:01 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | runjake wrote:
       | The author states they have evolved from Ansible to pyATS[1], but
       | pyATS is a Cisco project. With Cisco's poor code project and open
       | source track record, I'm not sure how this is much of an
       | improvement, and IMHO, it's arguably worse.
       | 
       | For possible alternatives, check out NAPALM[2] and Nornir[3].
       | 
       | It's also worth checking out Python for Network Engineers[4].
       | 
       | 1. https://developer.cisco.com/docs/pyats/
       | 
       | 2. https://napalm.readthedocs.io/en/latest/
       | 
       | 3. https://nornir.readthedocs.io/en/latest/
       | 
       | 4. https://pyneng.readthedocs.io/en/latest/index.html
        
         | xnyanta wrote:
         | Had the same reaction as soon as I found out pyATS is a cisco-
         | specific thing. I run very simple networks for events on
         | shoestring hardware/budgets and built a simple wrapper around
         | my own object model using python, jinja and napalm to deploy
         | cisco switches via SSH. Has terraform-like semantics
         | (plan/apply) and lets me be productive and eliminate config
         | drift. Napalm does all of the heavy lifting, it is fantastic. I
         | will probably be integrating it with netbox soon.
        
         | batch12 wrote:
         | Looks like he works for Cisco at the moment. Maybe that has
         | something to do with it.
        
       | betaby wrote:
       | ctr+f 'yang' - nothing
       | 
       | ctr+f 'netconf' - nothing
        
       | dvno42 wrote:
       | Hey this is cool! Thanks for sharing your hard work.
       | 
       | I have been living this for the past few years building an
       | automation product[0] and services company to lower the barrier
       | of entry and have tested many of these methodologies. We've also
       | written many different runbooks/playbooks for complicated
       | workflows. I'd like to share a couple experiences/opinions:
       | 
       | Netconf and vendor apis are lovely when available and working
       | well. Many devices don't support this and falling back to SSH
       | (sometimes even telnet) is a must for automation. Imo, you could
       | add value to your book by touching on Ktbyer's
       | Netmiko/Paramiko[1] as well as their nuances (timeouts, dealing
       | with interactive prompts, etc).
       | 
       | AAA is a big component of automation too. Having something in
       | place to handle authn/authz (radius/tacacs) enables consistency
       | for access across vendors. This also enables least privileged
       | accounts and rotation/limited lifetime of creds when used with
       | something like Hashicorp Vault[2]. I think you briefly mentioned
       | secrets management though Ansible vault.
       | 
       | Another technology that may be worth mentioning is Textfsm[3] in
       | conjunction with Netmiko. When we automate workflows for clients,
       | there's often times where the data we need to parse isn't easily
       | parsable. Using and expanding on textfsm makes this doable.
       | 
       | Lastly, much automation may only be one firmware change away from
       | breaking. Even with the big vendors, bugs are common that are
       | (ime) low priority to the OEM. Keep this in mind when writing
       | runbooks/playbooks, try to rely on features and output that are
       | unlikely to change across versions.
       | 
       | [0]https://realmhelm.com [1]https://github.com/ktbyers/netmiko
       | [2]https://github.com/hashicorp/vault
       | [3]https://github.com/google/textfsm
        
         | Cyph0n wrote:
         | +1 to textfsm: it is an extremely powerful approach to reliably
         | parse CLI-based outputs. I used to do some IOS-XR device
         | automation when I worked at Cisco - mainly for integration
         | testing - and I (and other teams) used it heavily.
         | 
         | This ties in to your point about how you often need to fallback
         | to SSH or Telnet. For example, a lot of platform-specific data
         | isn't exposed through standard interfaces, but almost
         | everything is available through a CLI. There are also times
         | when you have no choice but to use the CLI - for example, when
         | re-imaging or reloading a device.
        
       | nu11ptr wrote:
       | I do network automation for a profession. I build tools
       | (technically compilers) that take a proprietary object model
       | designed for our private cloud and translate that into Ansible
       | (v1) or Terraform (v2) code. At our company, I actually call
       | using these tools in isolation doing it "manually". This is
       | because the largest benefit of automation, I believe, is the
       | abstraction gained from the new object model and being to to
       | generate and store the inputs for Ansible/Terraform in a
       | database. If you have to track and specify all the inputs into
       | Ansible/Terraform and write the playbooks/HCL manually it is my
       | experience you don't actually save all that much work. However,
       | when you have an object model specifically designed for your use
       | case, you can deliver a new client network in literally minutes
       | (essentially nothing more than the cloud model, exactly what
       | AWS/Azure, etc does for their networking). The downside is most
       | enterprises don't have people like me to write the code to do
       | this, and writing it for a single deployment would likely not see
       | the gains that we see as a managed service provider.
        
         | jagged-chisel wrote:
         | Are you using an open source tool/stack to do this? Sounds
         | pretty awesome and I'd love to learn!
        
         | jmbwell wrote:
         | There's a push and pull; ansible and terraform both have some
         | facilities for doing what you describe, but of course if you're
         | using both tools, then you wind up where you are, needing yet
         | another layer of abstraction common to both.
         | 
         | In the book, the author presents an approach for storing the
         | object state and organizing the repository for ansible purposes
         | in what is at least as sensible a way as any other I've seen.
         | For installations that might not directly benefit from
         | additional layers of abstraction, managing object model state
         | using ansible's native functionality might well be sufficient.
         | 
         | This is all a legitimate challenge, in any case. Network
         | infrastructure and service instances have some management
         | issues in common, but where they differ, they can differ by
         | quite a bit, in ways that are hard to model at any level of
         | abstraction.
        
           | nu11ptr wrote:
           | I'm not using both. The first version of my tool used
           | Ansible. The second version used Terraform. They were written
           | 4 years apart. My users are not devops savvy. They use
           | runbook forms to call into my API giving them a very simple
           | UI that requires almost zero input. The object model includes
           | lifecycling so certain attributes can be changed, etc. and
           | validation done to ensure only a correct network is output.
           | This isn't required by everyone, but it wasn't done out of
           | necessity on how I'm using the tools, but to satisfy the
           | business problem I'm trying to solve (automate network
           | deployment with as few human inputs as possible over the
           | entire lifespan of a client and infrastructure).
           | 
           | I wasn't critiquing the author, but networks inherently have
           | a lot of input data. Much of this is not of concern to the
           | end user, hence why public clouds require almost zero input
           | on the network side.
           | 
           | I agree that my object model is purpose built for our
           | product. It would not work for someone else's network.
        
         | xnyanta wrote:
         | This model is probably more common than you think, I don't see
         | how anyone would be doing this any other way in a scalable
         | fashion.
        
         | tmerse wrote:
         | This sounds interesting, but I am not sure I fully understand.
         | Could an analogy be the object model to loosely correspond to
         | sth like Amazon cdk and the Ansible part being the derived
         | Cloudformation (any other analogy should do, but those are
         | things I understand a bit more although I use quite a bit of
         | ansible, but I am no network Person)? I still don't fully
         | understand the database part. Is it a better way to manage env
         | variables/allows for more flexible input?
         | 
         | Thank you
        
           | nu11ptr wrote:
           | Essentially we have a very specific network topology we are
           | trying to build for each of our clients. The goal is to auto-
           | generate as much of the input as possible, validate that
           | which is given, and allow it to be lifecycled (attributes can
           | change, but only in certain valid ways, objects
           | created/changed/deleted, but only if they aren't referenced
           | by other objects, etc). Due to this, a database is need to
           | store each "object". When the network is "pushed", the
           | database walked and a fresh set of ansible (or terraform for
           | v2) is generated in seconds.
           | 
           | Iow, it is custom set of lego bricks that can only be
           | combined in certain ways to build valid networks. It is
           | propriety to our cloud product which has the benefit of
           | allowing us to abstract things away that others probably
           | couldn't, but the downside of making it entirely non-
           | reusuable for a different use case.
        
         | totallywrong wrote:
         | Isn't that a lot of words to say that you have a custom set of
         | Terraform modules for your needs? If you're describing a
         | different or better way to do it I'm missing it.
        
           | nu11ptr wrote:
           | No. It is a frontend application that works as a CRUD REST
           | API, validates the data, generates what it can, and stores it
           | into a database/IPAM. It can then be changed, viewed,
           | modified, deleted, etc.
           | 
           | When you are ready to deploy I "compile" the object model
           | data into an IR (representing the "network topology") and
           | then make a final pass and translate into HCL for all the
           | various backends.
           | 
           | I'm not saying its "better" as it has trade offs. I'm saying
           | for networks specifically, it is the only way I've seen in
           | the real world to give these tools lots of value. Otherwise
           | the network engineers end up spending all their time looking
           | up the input data (vlans, subnets, ips, etc.) which is the
           | part that is most time consuming for manual configuration as
           | well. The validation and auto-generation of the input data is
           | where the value comes in.
        
             | totallywrong wrote:
             | Got it thanks, makes sense. The way I've frequently seen
             | this done, that goes more in line with the IaC and GitOps
             | trends, is people making a PR to the config repo with the
             | required values. Then a pipeline runs and does all
             | validations, pulls data from external sources, and runs the
             | terraform plan. If everything looks good upon review a
             | merge applies the saved plan.
        
         | tguvot wrote:
         | i worked on a product that did something similar for telecoms.
         | had a closed loop automation and graphical designer for object
         | model. it was 10 years ago.
         | 
         | looking today at all the manual work with playbooks/etc, it's
         | astonishing. feels like things didn't move forward at all in
         | past decade
        
           | dopylitty wrote:
           | Even in the big public clouds the user facing networking
           | really hasn't progressed beyond a layer of lipstick on top of
           | the kludges that were created for connecting physical servers
           | 40 years ago.
           | 
           | For instance in AWS you still have to care about BGP and ASNs
           | if you want to follow the most seamless approach to create a
           | multi-region mesh of VPCs. Why should I have to care about
           | that? AWS already knows where all the packets came from and
           | where they're going and should just put them in the right
           | place. I don't care how they get there and I certainly
           | shouldn't have to care about BGP attributes[1].
           | 
           | 1. https://docs.aws.amazon.com/network-
           | manager/latest/cloudwan/...
        
       | theideaofcoffee wrote:
       | I glanced through the guide and it's Windows and Cisco
       | (specifically IOS) heavy: mentions of the old Cisco architecture
       | via Core/Access/Distribution, where larger DC networks have
       | converged onto spine/spline setups, CDP/Cisco Discovery protocol
       | whereas the open-source LLDP is more generic, even the
       | nomenclature of 802.1q VLAN tags: access versus trunk. But I
       | guess if you are starting to automate a legacy office network, it
       | might be useful.
       | 
       | More recent non-IOS network OSes that lend themselves to
       | automation, especially in the datacenter, the likes of Cumulus or
       | SONiC are pure linux with some asic-vendor-specific bits and
       | bobs, so I'm unsure of the applicability of this guide to larger,
       | more modern networks. Tools like ansible could be a good fit
       | here, but since they are 'just' linux, might as well use a
       | dedicated config management tool like chef or puppet.
       | 
       | Otherwise I think it's well written for someone in a smaller shop
       | wanting to get their feet wet with ansible and other tools but
       | still stuck on IOS.
        
         | jimmar wrote:
         | > old Cisco architecture via Core/Access/Distribution, where
         | larger DC networks have converged onto spine/spline setups
         | 
         | Please correct me if I'm wrong, but I see the "old"
         | core/access/distribution layers still relevant. The datacenter
         | spine/spline setup applies to networking between server racks
         | in the data center.
         | 
         | > 802.1q VLAN tags: access versus trunk
         | 
         | Again, are you saying that these are outdated? I'm not a
         | practicing network engineer, but I know several network
         | engineers and they've told me that understanding 802.1q VLAN
         | tags to segment network traffic has been helpful.
        
           | kazen44 wrote:
           | > Please correct me if I'm wrong, but I see the "old"
           | core/access/distribution layers still relevant. The
           | datacenter spine/spline setup applies to networking between
           | server racks in the data center.
           | 
           | this is correct. The place where spine-leaf really shines is
           | when used in combination with evpn-vxlan. You can then
           | encapsulate every tenant network inside a VXLAN domain and
           | route those between your leafs switches through your spine
           | layer.
           | 
           | This is basically a clos fabric which is non-blocking, and is
           | very easy to expand horizontally. It also gives you nice
           | features like ARP suppression[0]. These features are
           | important in a DC fabric because ARP flooding is traffic
           | which is not revenue generating, and should be minimized as
           | much as possible.
           | 
           | For normal Enterprise/Office network, running an evpn-vxlan
           | fabric is usually far to complex for the benefits involved.
           | 
           | [0] https://satishdotpatel.github.io/how-does-arp-
           | suppression-wo...
        
           | darkr wrote:
           | > 802.1q VLAN tags: access versus trunk
           | 
           | I think the parent was saying that these are Cisco specific
           | terms; more generic terms would be "untagged" + "tagged".
        
             | ajsnigrutin wrote:
             | Trunk and access ports are like kleenex and bandaids. Yes,
             | technically cisco terminology, but used everywhere.
        
               | iso1631 wrote:
               | Absolutely, here's a config from one of my aristas(with
               | bits snipped)                  interface Ethernet1
               | switchport trunk native vlan 899           switchport
               | trunk allowed vlan 801           switchport mode trunk
               | interface Ethernet13           switchport access vlan 311
               | 
               | And on a Juniper                  set interfaces xe-0/2/1
               | unit 0 family ethernet-switching interface-mode trunk
               | set interfaces xe-0/2/1 unit 0 family ethernet-switching
               | vlan members Mgmt_B        set interfaces xe-0/2/1 unit 0
               | family ethernet-switching vlan members Audio_2
               | ....        set interfaces ge-0/0/19 unit 0 family
               | ethernet-switching interface-mode access        set
               | interfaces ge-0/0/19 unit 0 family ethernet-switching
               | vlan members Audio_2
               | 
               | When Cisco, Arista, Juniper all use access vs trunk it's
               | hardly a vendor specific term
        
       | metadat wrote:
       | Direct link to the PDF:
       | 
       | https://github.com/automateyournetwork/automate_your_network...
        
       ___________________________________________________________________
       (page generated 2023-07-03 23:01 UTC)