[HN Gopher] Automate Your Network
___________________________________________________________________
Automate Your Network
Author : hjuutilainen
Score : 104 points
Date : 2023-07-03 15:01 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| runjake wrote:
| The author states they have evolved from Ansible to pyATS[1], but
| pyATS is a Cisco project. With Cisco's poor code project and open
| source track record, I'm not sure how this is much of an
| improvement, and IMHO, it's arguably worse.
|
| For possible alternatives, check out NAPALM[2] and Nornir[3].
|
| It's also worth checking out Python for Network Engineers[4].
|
| 1. https://developer.cisco.com/docs/pyats/
|
| 2. https://napalm.readthedocs.io/en/latest/
|
| 3. https://nornir.readthedocs.io/en/latest/
|
| 4. https://pyneng.readthedocs.io/en/latest/index.html
| xnyanta wrote:
| Had the same reaction as soon as I found out pyATS is a cisco-
| specific thing. I run very simple networks for events on
| shoestring hardware/budgets and built a simple wrapper around
| my own object model using python, jinja and napalm to deploy
| cisco switches via SSH. Has terraform-like semantics
| (plan/apply) and lets me be productive and eliminate config
| drift. Napalm does all of the heavy lifting, it is fantastic. I
| will probably be integrating it with netbox soon.
| batch12 wrote:
| Looks like he works for Cisco at the moment. Maybe that has
| something to do with it.
| betaby wrote:
| ctr+f 'yang' - nothing
|
| ctr+f 'netconf' - nothing
| dvno42 wrote:
| Hey this is cool! Thanks for sharing your hard work.
|
| I have been living this for the past few years building an
| automation product[0] and services company to lower the barrier
| of entry and have tested many of these methodologies. We've also
| written many different runbooks/playbooks for complicated
| workflows. I'd like to share a couple experiences/opinions:
|
| Netconf and vendor apis are lovely when available and working
| well. Many devices don't support this and falling back to SSH
| (sometimes even telnet) is a must for automation. Imo, you could
| add value to your book by touching on Ktbyer's
| Netmiko/Paramiko[1] as well as their nuances (timeouts, dealing
| with interactive prompts, etc).
|
| AAA is a big component of automation too. Having something in
| place to handle authn/authz (radius/tacacs) enables consistency
| for access across vendors. This also enables least privileged
| accounts and rotation/limited lifetime of creds when used with
| something like Hashicorp Vault[2]. I think you briefly mentioned
| secrets management though Ansible vault.
|
| Another technology that may be worth mentioning is Textfsm[3] in
| conjunction with Netmiko. When we automate workflows for clients,
| there's often times where the data we need to parse isn't easily
| parsable. Using and expanding on textfsm makes this doable.
|
| Lastly, much automation may only be one firmware change away from
| breaking. Even with the big vendors, bugs are common that are
| (ime) low priority to the OEM. Keep this in mind when writing
| runbooks/playbooks, try to rely on features and output that are
| unlikely to change across versions.
|
| [0]https://realmhelm.com [1]https://github.com/ktbyers/netmiko
| [2]https://github.com/hashicorp/vault
| [3]https://github.com/google/textfsm
| Cyph0n wrote:
| +1 to textfsm: it is an extremely powerful approach to reliably
| parse CLI-based outputs. I used to do some IOS-XR device
| automation when I worked at Cisco - mainly for integration
| testing - and I (and other teams) used it heavily.
|
| This ties in to your point about how you often need to fallback
| to SSH or Telnet. For example, a lot of platform-specific data
| isn't exposed through standard interfaces, but almost
| everything is available through a CLI. There are also times
| when you have no choice but to use the CLI - for example, when
| re-imaging or reloading a device.
| nu11ptr wrote:
| I do network automation for a profession. I build tools
| (technically compilers) that take a proprietary object model
| designed for our private cloud and translate that into Ansible
| (v1) or Terraform (v2) code. At our company, I actually call
| using these tools in isolation doing it "manually". This is
| because the largest benefit of automation, I believe, is the
| abstraction gained from the new object model and being to to
| generate and store the inputs for Ansible/Terraform in a
| database. If you have to track and specify all the inputs into
| Ansible/Terraform and write the playbooks/HCL manually it is my
| experience you don't actually save all that much work. However,
| when you have an object model specifically designed for your use
| case, you can deliver a new client network in literally minutes
| (essentially nothing more than the cloud model, exactly what
| AWS/Azure, etc does for their networking). The downside is most
| enterprises don't have people like me to write the code to do
| this, and writing it for a single deployment would likely not see
| the gains that we see as a managed service provider.
| jagged-chisel wrote:
| Are you using an open source tool/stack to do this? Sounds
| pretty awesome and I'd love to learn!
| jmbwell wrote:
| There's a push and pull; ansible and terraform both have some
| facilities for doing what you describe, but of course if you're
| using both tools, then you wind up where you are, needing yet
| another layer of abstraction common to both.
|
| In the book, the author presents an approach for storing the
| object state and organizing the repository for ansible purposes
| in what is at least as sensible a way as any other I've seen.
| For installations that might not directly benefit from
| additional layers of abstraction, managing object model state
| using ansible's native functionality might well be sufficient.
|
| This is all a legitimate challenge, in any case. Network
| infrastructure and service instances have some management
| issues in common, but where they differ, they can differ by
| quite a bit, in ways that are hard to model at any level of
| abstraction.
| nu11ptr wrote:
| I'm not using both. The first version of my tool used
| Ansible. The second version used Terraform. They were written
| 4 years apart. My users are not devops savvy. They use
| runbook forms to call into my API giving them a very simple
| UI that requires almost zero input. The object model includes
| lifecycling so certain attributes can be changed, etc. and
| validation done to ensure only a correct network is output.
| This isn't required by everyone, but it wasn't done out of
| necessity on how I'm using the tools, but to satisfy the
| business problem I'm trying to solve (automate network
| deployment with as few human inputs as possible over the
| entire lifespan of a client and infrastructure).
|
| I wasn't critiquing the author, but networks inherently have
| a lot of input data. Much of this is not of concern to the
| end user, hence why public clouds require almost zero input
| on the network side.
|
| I agree that my object model is purpose built for our
| product. It would not work for someone else's network.
| xnyanta wrote:
| This model is probably more common than you think, I don't see
| how anyone would be doing this any other way in a scalable
| fashion.
| tmerse wrote:
| This sounds interesting, but I am not sure I fully understand.
| Could an analogy be the object model to loosely correspond to
| sth like Amazon cdk and the Ansible part being the derived
| Cloudformation (any other analogy should do, but those are
| things I understand a bit more although I use quite a bit of
| ansible, but I am no network Person)? I still don't fully
| understand the database part. Is it a better way to manage env
| variables/allows for more flexible input?
|
| Thank you
| nu11ptr wrote:
| Essentially we have a very specific network topology we are
| trying to build for each of our clients. The goal is to auto-
| generate as much of the input as possible, validate that
| which is given, and allow it to be lifecycled (attributes can
| change, but only in certain valid ways, objects
| created/changed/deleted, but only if they aren't referenced
| by other objects, etc). Due to this, a database is need to
| store each "object". When the network is "pushed", the
| database walked and a fresh set of ansible (or terraform for
| v2) is generated in seconds.
|
| Iow, it is custom set of lego bricks that can only be
| combined in certain ways to build valid networks. It is
| propriety to our cloud product which has the benefit of
| allowing us to abstract things away that others probably
| couldn't, but the downside of making it entirely non-
| reusuable for a different use case.
| totallywrong wrote:
| Isn't that a lot of words to say that you have a custom set of
| Terraform modules for your needs? If you're describing a
| different or better way to do it I'm missing it.
| nu11ptr wrote:
| No. It is a frontend application that works as a CRUD REST
| API, validates the data, generates what it can, and stores it
| into a database/IPAM. It can then be changed, viewed,
| modified, deleted, etc.
|
| When you are ready to deploy I "compile" the object model
| data into an IR (representing the "network topology") and
| then make a final pass and translate into HCL for all the
| various backends.
|
| I'm not saying its "better" as it has trade offs. I'm saying
| for networks specifically, it is the only way I've seen in
| the real world to give these tools lots of value. Otherwise
| the network engineers end up spending all their time looking
| up the input data (vlans, subnets, ips, etc.) which is the
| part that is most time consuming for manual configuration as
| well. The validation and auto-generation of the input data is
| where the value comes in.
| totallywrong wrote:
| Got it thanks, makes sense. The way I've frequently seen
| this done, that goes more in line with the IaC and GitOps
| trends, is people making a PR to the config repo with the
| required values. Then a pipeline runs and does all
| validations, pulls data from external sources, and runs the
| terraform plan. If everything looks good upon review a
| merge applies the saved plan.
| tguvot wrote:
| i worked on a product that did something similar for telecoms.
| had a closed loop automation and graphical designer for object
| model. it was 10 years ago.
|
| looking today at all the manual work with playbooks/etc, it's
| astonishing. feels like things didn't move forward at all in
| past decade
| dopylitty wrote:
| Even in the big public clouds the user facing networking
| really hasn't progressed beyond a layer of lipstick on top of
| the kludges that were created for connecting physical servers
| 40 years ago.
|
| For instance in AWS you still have to care about BGP and ASNs
| if you want to follow the most seamless approach to create a
| multi-region mesh of VPCs. Why should I have to care about
| that? AWS already knows where all the packets came from and
| where they're going and should just put them in the right
| place. I don't care how they get there and I certainly
| shouldn't have to care about BGP attributes[1].
|
| 1. https://docs.aws.amazon.com/network-
| manager/latest/cloudwan/...
| theideaofcoffee wrote:
| I glanced through the guide and it's Windows and Cisco
| (specifically IOS) heavy: mentions of the old Cisco architecture
| via Core/Access/Distribution, where larger DC networks have
| converged onto spine/spline setups, CDP/Cisco Discovery protocol
| whereas the open-source LLDP is more generic, even the
| nomenclature of 802.1q VLAN tags: access versus trunk. But I
| guess if you are starting to automate a legacy office network, it
| might be useful.
|
| More recent non-IOS network OSes that lend themselves to
| automation, especially in the datacenter, the likes of Cumulus or
| SONiC are pure linux with some asic-vendor-specific bits and
| bobs, so I'm unsure of the applicability of this guide to larger,
| more modern networks. Tools like ansible could be a good fit
| here, but since they are 'just' linux, might as well use a
| dedicated config management tool like chef or puppet.
|
| Otherwise I think it's well written for someone in a smaller shop
| wanting to get their feet wet with ansible and other tools but
| still stuck on IOS.
| jimmar wrote:
| > old Cisco architecture via Core/Access/Distribution, where
| larger DC networks have converged onto spine/spline setups
|
| Please correct me if I'm wrong, but I see the "old"
| core/access/distribution layers still relevant. The datacenter
| spine/spline setup applies to networking between server racks
| in the data center.
|
| > 802.1q VLAN tags: access versus trunk
|
| Again, are you saying that these are outdated? I'm not a
| practicing network engineer, but I know several network
| engineers and they've told me that understanding 802.1q VLAN
| tags to segment network traffic has been helpful.
| kazen44 wrote:
| > Please correct me if I'm wrong, but I see the "old"
| core/access/distribution layers still relevant. The
| datacenter spine/spline setup applies to networking between
| server racks in the data center.
|
| this is correct. The place where spine-leaf really shines is
| when used in combination with evpn-vxlan. You can then
| encapsulate every tenant network inside a VXLAN domain and
| route those between your leafs switches through your spine
| layer.
|
| This is basically a clos fabric which is non-blocking, and is
| very easy to expand horizontally. It also gives you nice
| features like ARP suppression[0]. These features are
| important in a DC fabric because ARP flooding is traffic
| which is not revenue generating, and should be minimized as
| much as possible.
|
| For normal Enterprise/Office network, running an evpn-vxlan
| fabric is usually far to complex for the benefits involved.
|
| [0] https://satishdotpatel.github.io/how-does-arp-
| suppression-wo...
| darkr wrote:
| > 802.1q VLAN tags: access versus trunk
|
| I think the parent was saying that these are Cisco specific
| terms; more generic terms would be "untagged" + "tagged".
| ajsnigrutin wrote:
| Trunk and access ports are like kleenex and bandaids. Yes,
| technically cisco terminology, but used everywhere.
| iso1631 wrote:
| Absolutely, here's a config from one of my aristas(with
| bits snipped) interface Ethernet1
| switchport trunk native vlan 899 switchport
| trunk allowed vlan 801 switchport mode trunk
| interface Ethernet13 switchport access vlan 311
|
| And on a Juniper set interfaces xe-0/2/1
| unit 0 family ethernet-switching interface-mode trunk
| set interfaces xe-0/2/1 unit 0 family ethernet-switching
| vlan members Mgmt_B set interfaces xe-0/2/1 unit 0
| family ethernet-switching vlan members Audio_2
| .... set interfaces ge-0/0/19 unit 0 family
| ethernet-switching interface-mode access set
| interfaces ge-0/0/19 unit 0 family ethernet-switching
| vlan members Audio_2
|
| When Cisco, Arista, Juniper all use access vs trunk it's
| hardly a vendor specific term
| metadat wrote:
| Direct link to the PDF:
|
| https://github.com/automateyournetwork/automate_your_network...
___________________________________________________________________
(page generated 2023-07-03 23:01 UTC)