[HN Gopher] Setup AWS Cloudwatch Monitoring and Alerts Using Bas...
       ___________________________________________________________________
        
       Setup AWS Cloudwatch Monitoring and Alerts Using Bash Scripts
        
       Author : sks147
       Score  : 30 points
       Date   : 2021-06-27 15:00 UTC (8 hours ago)
        
 (HTM) web link (themythicalengineer.com)
 (TXT) w3m dump (themythicalengineer.com)
        
       | grouphugs wrote:
       | the world would be better off abandoning the bezos' empire
        
       | coredog64 wrote:
       | High CPU alerts are terrible alerts. If I'm paying per instance,
       | I _want_ CPU utilization to be high. If it 's low, I'm wasting
       | money. So now what I need is an alert where it's not high, but
       | somewhere between "high and too high". You know, like when
       | there's an arbitrary spike because the Java is doing some GC. Or
       | you have a one minute spike of traffic that fires an Ops Genie
       | alert at 2am but auto-clears between when the on-call engineer
       | wakes up and when they log in to check.
       | 
       | For the love of $DIETY, if you're going to set up CloudWatch
       | monitoring, create custom metrics that map to your business
       | outcomes and alert when _those_ go off the rails.
        
         | vergessenmir wrote:
         | Custom Cloudwatch metrics are expensive to write to making them
         | useful for coarse grained high level service metrics. If you
         | can afford it go ahead but setting up some other cloud native
         | monitoring service may be the way to go.
        
         | dimitar wrote:
         | If you are running some software that requires an instance, but
         | is in not expected to create load you can put it in a
         | burstable, and setup such an alert, so you know when it is time
         | to upgrade.
        
       | [deleted]
        
       | orf wrote:
       | Not sure why you'd ever do this instead of using terraform.
        
         | hughrr wrote:
         | As someone who spends two hours a day dealing with buggered
         | terraform state and upgrading terraform and dealing with
         | terraform bugs I can see it.
         | 
         | It's one of those things that really works pretty well but
         | there are enough edge cases to make it slightly soul sucking.
        
           | orf wrote:
           | Would it be more soul sucking than emulating it with bash, as
           | the article is almost suggesting?
        
             | hughrr wrote:
             | About the same. Just nice for stuff to suck in a different
             | way occasionally.
        
           | gizdan wrote:
           | This sounds like a lack of understanding of terraform. We use
           | Terraform pretty heavily and I've rarely seen bad states
           | across our whole org, and the few that I do see are usually
           | people who don't know the core concepts (often non-devops
           | engineers).
           | 
           | Terraform has its faults, but it is the best in its class,
           | especially when you need to manage infrastructure beyond a
           | single cloud provider (e.g. we manage our datadog monitors
           | and dashboard, pagerduty alerts and much more). The only
           | other thing that would probably thrash it is pulumi, which
           | has similar concepts, except you can many different languages
           | as opposed to HCL (no CDK doesn't count because it is very
           | immature still and last I checked it only supported one or
           | two languages).
        
             | hughrr wrote:
             | I completely agree with your points there and that is
             | probably the issue.
        
           | ldoughty wrote:
           | Agreed, I swapped my team from teraform to Ansible to SAM...
           | SAM has been the most reliable and resilient and stable for
           | my use cases (general serverless)
        
             | rantwasp wrote:
             | SAM is cloudformation. cloudformation is the thing to use
             | if you're on the AWS cloud
        
               | void_mint wrote:
               | CloudFormation is without a doubt the worst cloud
               | technology I have ever used.
        
               | coredog64 wrote:
               | It's at least second or third worst. Worst would be
               | writing your own deployment tool that does what
               | CloudFormation (or TF or Pulumi) do. Second worst would
               | be writing a tool that uses a templating language to
               | generate CloudFormation and only using that.
        
               | void_mint wrote:
               | Yeah I wasn't really considering home rolled stuff.
               | Officially supported tech.
        
               | rantwasp wrote:
               | lol. what's next? you used terraform and it was awesome?
        
           | manderson89 wrote:
           | If you don't like Terraform then you should use
           | CloudFormation, not bash scripts.
        
             | hughrr wrote:
             | Oh no that's even worse.
        
               | manderson89 wrote:
               | If you prefer imperative infrastructure creation to
               | declarative then I think you're doing something wrong.
               | Both Terraform and CloudFormation are quite easy to
               | manage compared to writing and managing scripts (bash or
               | otherwise).
        
               | hughrr wrote:
               | I'm only having a gripe. I use terraform because it's the
               | least bad tool, not because it's the best. I wish for
               | better.
        
         | qvrjuec wrote:
         | Or CDK... If you're writing code to generate infra why jump
         | through more hoops than you'd need to
        
         | miyuru wrote:
         | Yes.
         | 
         | Using terraform for this is great is because it removes the
         | unwanted alarms.
         | 
         | I had to create alarms when the instances auto scale and wrote
         | a python script using cdktf and now the Jenkins job handles it.
         | It even updates the cloudwatch dashboard.
        
         | codingwageslave wrote:
         | Cdk is a thousand times better
        
       | amirkdv wrote:
       | We've been looking at making CloudWatch (CW) alarms an automated
       | part of our infra. Here are some findings that may help:
       | 
       | - The semantics of CW seem convoluted. But once you stare at API
       | docs for long enough, the core concepts are easy to grok: Metrics
       | (regularly submitted from machine to CW), Alarms (abstractions
       | for defining the logic of an alarm based on behavior of Metrics),
       | and SNS Topics (could be just an email address, for what to do
       | when an Alarm goes off).
       | 
       | - Once you get the data model right, all implementations (click
       | ops, terraform, bash via awscli, boto3, etc) are all visibly
       | identical.
       | 
       | - Some Metrics come for free, e.g. CPU usage is reported by any
       | EC2 instance to CW. For some other Metrics, notably disk and
       | memory usage, you need to configure your instance to report them
       | to CW. This is where the OP's monitoring scripts come in.
       | 
       | - The monitoring scripts and the cron config the OP refers to are
       | deprecated [0]. Instead there's a new CloudWatch Agent [1]: you
       | install the package on your EC2 instances, provide a
       | configuration file to it, and you're set.
       | 
       | [0] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mon-
       | scri...
       | 
       | [1]
       | https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitori...
        
         | jasonpeacock wrote:
         | You can install the CWAgent on any host, they don't have to be
         | EC2 instances:
         | 
         | https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitori...
        
       ___________________________________________________________________
       (page generated 2021-06-27 23:01 UTC)