hngopher.com

       [HN Gopher] Terraform is not the golden hammer
       ___________________________________________________________________
        
       Terraform is not the golden hammer
        
       Author : kiyanwang
       Score  : 134 points
       Date   : 2021-09-19 10:44 UTC (12 hours ago)
        
 (HTM) web link (hub.qovery.com)
 (TXT) w3m dump (hub.qovery.com)
        
       | d0gsg0w00f wrote:
       | IME, Terraform is great for "fire and forget" deployments.
       | However, if you're trying to use it to continously bring older
       | existing deployed infra up to date it can get tricky. I strongly
       | suggest self versioning your TF files and following strict
       | pretested upgrade paths.
        
         | robertlagrant wrote:
         | If you just use a cloud provider's UI you will have no
         | separation between desired and actual states. Then, whatever
         | people change it to is the only truth.
        
         | nanis wrote:
         | > Terraform is great for "fire and forget" deployments.
         | 
         | Indeed, terraform is great for exactly the opposite: For making
         | sure both the initial infrastructure and subsequent
         | modifications to it are first proposed, discussed, reviewed in
         | code and applied afterwards instead of anyone with privileges
         | tinkering with settings on an "as needed" basis in whatever
         | console thereby ending up with an infrastructure where you have
         | no idea who turned on the frobnicator, why it was set to 11,
         | and what might be the consequences of changing the setting.
        
           | d0gsg0w00f wrote:
           | We experienced a lot of problems with when trying to manage
           | standard configurations of hundreds of AWS VPC's ranging in
           | age from 3 days old to 6 years. Accounts built years ago
           | using older terraform would have to be handled completely
           | differently because inevitable TF template drift and TF
           | versions made each upgrade path unique. Not insurmountable by
           | any means but also not trivial. Just sharing my experience.
        
       | 3np wrote:
       | The author seems to have some misunderstandings on how Terraform
       | is supposed to work - you should get the "automatic
       | reconciliation" they're saying is missing. Also,
       | 
       | > I run once again the "terraform apply" command. But for some
       | reason, Cloudflare API doesn't answer and I got completely stuck
       | there without the possibility to update with Terraform this field
       | because of linked dependencies.
       | 
       | You should be able to circumvent this with a `-target`.
       | 
       | That being said, I know exactly what they're talking about with
       | helm. IME the helm provider was/is a complete mess and gets
       | inconsistent state a lot. Helm specifically I would also keep out
       | of TF until that is fixed, if ever. I haven't had that happen
       | with other providers, though. Perhaps OP was just really unlucky
       | ending up with the odd half-broken AWS module.
        
       | pojzon wrote:
       | I don't have issue with terraform as it has a very clear defined
       | usecases.
       | 
       | I have issues with all the providers that make no sense like
       | application configuration providers or all the flavors of kubectl
       | providers..
       | 
       | Those are often very low quality and have various issues
       | dedicated solutions don't have.
       | 
       | An example could be helm and helm_provider. The former just
       | works, with the latter Im constantly running into weird bugs that
       | break terraform state..
        
       | digianarchist wrote:
       | You can run terraform apply against a particular resource which
       | will only provision that resource and its dependencies.
       | 
       | tfstate files can be painful to manage, we had a lot of trouble
       | with them at Capital One but mostly because:
       | 
       | 1. People would modify state outside of TF which you should
       | avoid.
       | 
       | 2. People didn't architect their apps well which led to long
       | lived infra. TF works best with cattle like infra.
       | 
       | terraform import feels very much like an afterthought which is
       | why projects like terraformer exist.
        
       | picardo wrote:
       | Terraform can be frustratingly slow at times. You have to realize
       | that at the end of the day, it's an abstraction layer on top of
       | the public APIs of a cloud service. If all your services are
       | hosted on a single cloud, you don't need Terraform.
       | 
       | We saw a huge improvement in our build times after we started
       | using AWS CDK directly.
        
         | amarshall wrote:
         | What does the AWA CDK do differently than use public APIs?
        
           | scrollaway wrote:
           | CDK generates cloudformation stacks. Those stacks are
           | deployed as units, within AWS itself. AWS treats all the
           | resources as part of that stack etc; it's a concept entirely
           | proper to AWS.
           | 
           | Terraform can create cloudformation stacks as well, you just
           | have to write the resources for it. It doesn't really make
           | sense to do that. I also don't know that it's ... "faster" in
           | any way; cfn is really slow.
        
             | orf wrote:
             | It's just tags on the resources and a managed statefile.
             | There isn't anything different between a bucket created via
             | CDK and a bucket created via terraform, the resources are
             | the same and the API calls to create them are also the
             | same.
        
         | kenerwin88 wrote:
         | Hmm, as a former AWS employee who has used both heavily, my
         | experience has been the opposite.
         | 
         | Terraform's AWS provider calls the APIs directly, whereas CDK
         | generates Cloudformation, an abstraction on top of the AWS
         | APIs. For me, using Terraform was significantly faster than
         | applying the same stack via CDK.
         | 
         | Or do you mean you're able to iterate faster writing CDK vs TF?
        
           | picardo wrote:
           | Thinking back on it, we always used Terraform with Pulumi,
           | which creates its own abstraction layer for a CF stack. It's
           | hard to pinpoint where the root cause of the slowness was..
           | but in principle having fewer abstractions allowed us to
           | iterate faster, and fix the bugs more quickly.
        
         | snom380 wrote:
         | From my experience, terraform is almost always slow because
         | it's making API calls out to the cloud providers, and a lot of
         | that in turn is slow because many providers offer "eventually
         | consistent" which terraform needs to compensate for by doing
         | roundtrips to validate that changes are have become visible
         | (applys failing because of that was a common problem in the
         | early days of terraform).
        
         | ulzeraj wrote:
         | I'd rather devote my time learning an agnostic tool like
         | terraform. I can be part of your team right now working on AWS
         | but tomorrow I might be working for an Azure shop.
        
         | qaq wrote:
         | CDK synth is reasonably fast but CF is really slow
        
       | gjhr wrote:
       | > When you run Terraform against AWS on the subnets part, it will
       | create (anytime you deploy) the missing subnets
       | 
       | That is one of the core features of Terraform? Detecting and
       | fixing drift is useful.
        
       | NKosmatos wrote:
       | Came here to read about planet terraforming but instead I learned
       | about (yet another) cloud deployment tool :-)
        
         | scrollaway wrote:
         | Ah, congrats on being part of today's ten thousand.
         | 
         | I very highly recommend investigating it more and trying it a
         | bit. Terraform isn't mere cloud deployment.
         | 
         | As a small project you can start by deploying an EC2, RDS and
         | some cloudflare records to go with them, all linked together
         | with terraform. This will give you an initial idea of its
         | capabilities.
        
         | rad_gruchalski wrote:
         | > learned about (yet another) cloud deployment tool
         | 
         | Please don't read it as an attack, not the intent. Amazing that
         | a HN regular can "learn about Terraform" only in the latter
         | part of 2021!
        
           | jhgb wrote:
           | I made the same mistake. First hearing about this. It turns
           | out that different people may have different backgrounds! ;)
        
           | johannes1234321 wrote:
           | https://xkcd.com/1053/
           | 
           | I assume there is some selective reading and most articles
           | referring to Terraform have a cloud or hashicorp reference in
           | the title. If you don't care about either, you don't read the
           | Terraform things on HN.
        
       | nojvek wrote:
       | Yeah the tfstate was a big gotcha. Pulumi has the same problems.
       | 
       | What we really need is automatic reconciliation. I.e ask the
       | provider what they have and then diff against that.
       | 
       | Or periodically auto-importing.
       | 
       | Are there any good solutions to auto-importing?
        
         | robertlagrant wrote:
         | What's the point of reimporting? Aren't you then no longer
         | separating desired state from actual?
        
         | kall wrote:
         | Hm, maybe what we really need is a new entrant in hyperscale
         | clouds that is built from the ground up for IaC and just does
         | away with the split between state and reality. I would love to
         | see one, anyway.
        
         | arpinum wrote:
         | The hard part with auto-importing is the resource id. This is
         | usually generated server-side and is not included in the hcl.
         | Often resources define a user-supplied identifier as well, such
         | as a name property, and this could be used for auto-importing
         | if the property has unique constraints applied against it.
         | However, not all resources have this feature, so its not a
         | universal solution.
        
       | scrollaway wrote:
       | It's very difficult to communicate what Terraform's strengths and
       | weaknesses are to someone who's never used it or IAC in general.
       | 
       | Spend enough time playing with it and understanding it, you'll
       | end up like me thinking about all the shit you configure left and
       | right such as hooking up Stripe's secret keys, Google Analytics
       | and the webmaster console, and just about everything else we
       | configure via web interfaces, and you'll think:
       | 
       | Why can't we use Terraform for this as well? Manage these SaaS
       | products the same way we manage the rest of our cloud, test and
       | audit changes, automatically roll secrets and update anything
       | that needs updating the moment you change a setting.
       | 
       | Ah well. Not enough APIs out there. And it's difficult to write
       | and maintain terraform plugins for these throwaway cases
       | especially if they are going to use private APIs. Anyone know if
       | Pulumi plugins are easier to write?
        
         | brightball wrote:
         | IMO, this is an area where I think Terraform + Ansible pairs so
         | well together.
         | 
         | If there's ever a gap in what Terraform offers you can pretty
         | easily fill it with Ansible.
        
           | weitzj wrote:
           | And this setup is also encouraged by HashiCorp (at least I
           | saw a talk by them). Use ansible for your "smart" sequential
           | executions and Terraform as a sane wrapper for state.
        
             | chucky_z wrote:
             | I've found running Terraform via Ansible to be a pretty
             | good experience.
        
               | weitzj wrote:
               | So the other way around. Do you employ a GitOps approach
               | this way?
               | 
               | I find it hard to figure out how to use GitOps with
               | Ansible? How do you make a PullRequest which indicates
               | that something should get deleted? You still would have
               | to keep around an ansible playbook for the stuff you want
               | to delete.
        
             | Octabrain wrote:
             | I've seen (and fixed) so many ugly messes at this point
             | made as a result of mixing and wrapping tools with
             | different purposes together like Ansible + Terraform that
             | it's something I strongly discourage. I recommend to keep
             | the boundaries and responsibilities of the tools clear. In
             | this case, Terraform for the creation of resources and
             | Ansible for the configuration of those resources. In my
             | opinion, this gives as a result a much simpler and
             | maintainable ecosystem in the long run.
        
             | [deleted]
        
           | pram wrote:
           | Ansible is also amazing with Packer as the provisioner.
        
         | kall wrote:
         | Your intuition is right on pulumi. Creating these kinds of
         | extensions is minimal effort. You can start out by adding a
         | "dynamic resource" class to your infra codebase and extract it
         | into a plugin later, or not.
         | 
         | These are not the same as "real" pulumi providers that run
         | across all supported programming languages, but I think they
         | are a good enough fit for the cases you mention.
        
         | crabmusket wrote:
         | I've used https://github.com/Mastercard/terraform-provider-
         | restapi successfully with a cloud provider which provides a
         | suitable HTTP API. There was a bit of fiddling with JSON
         | formatting and their API docs, but it wasn't too hard all in
         | all.
         | 
         | But like you say - now I've done that, I want to do it for
         | every UI that I'm forced to log in to!
        
           | weitzj wrote:
           | Yes. We use the Rest Api Provider extensively to provision
           | ElasticSearch and Kibana.
        
           | MuffinFlavored wrote:
           | what's the backing database for this, the .tfstste file? do
           | the resources/secrets you create end up getting "backed up"
           | (committed) to git?
        
           | scrollaway wrote:
           | Wow, neat, thanks for the link. Maintained by Mastercard, eh?
           | Anyone else on HN used this or worked on it?
        
         | mason55 wrote:
         | This is how I feel about any kind of configuration after moving
         | all my personal systems to NixOS.
         | 
         | "What do you mean run an installer and update these files..."
        
         | satya71 wrote:
         | Pulumi has dynamic providers. You define the crud operations
         | and Pulumi manage the state.
        
           | reilly3000 wrote:
           | After a little grokking it's a surprisingly easy way to
           | manage arbitrary API resources. They are just functions in
           | your language of choice and can accept parameters from any
           | dependency. It's also possible to roll your own provider, but
           | dynamic providers cover all kinds of use cases.
        
         | [deleted]
        
         | vasco wrote:
         | I wish I could use it for everything as well. Every other tool
         | the business depends on just scares me to shit that it's config
         | isn't in code and we can't have a proper backup of it. At least
         | Datadog supports terraform for most things and not only we
         | manage all our infra through terraform we manage all our
         | monitoring with it too. I doubt very much I'll ever go back to
         | non-monitoring-as-code if that's even a term.
         | 
         | Infrastructure, all the monitoring as well as all the on-call
         | rotation configurations (and anything else that is in that
         | loop) should all be in code, and all changes should be reviewed
         | the same way as application code does. If it doesn't, you can't
         | really trust you're gonna be alerted properly when things start
         | breaking.
         | 
         | I wish I could use it for personal things too, I'd rather have
         | my bank account settings, my government tax information, yada
         | yada in a personal terraform repository for example. Change of
         | address? Commit a a change, check if the plan is good and apply
         | to change it everywhere. Though having lots of experience with
         | Terraform I can only imagine what the equivalent of trying to
         | delete an S3 bucket that still has data in it is for a bank
         | account.
        
           | OJFord wrote:
           | I so agree; I tried/am trying to write an Android provider -
           | currently just have app (un)installation working, and not
           | very well, I expected settings management to be the hard
           | part, but egh.
           | 
           | Why can't everything have nice public APIs! And, while I'm at
           | it, some sort of all-encompassing ticketing system, hell even
           | if it were Jira. 'Pothole', assign local council. Blocked on
           | '2021 roadworks funding increase', backlogged. Assigned to
           | councillor. Won't fix. Ok - maybe I'm not making it sound
           | _great_ , but at least you could see some reasoning, and what
           | the blockers are. Follow the chain to work out that 'communal
           | lobby needs repainting' hasn't been acted kn by building
           | management company because, ultimately, of global supply
           | chain disruption and the contractor's supplier's supplier
           | can't get any paint ingredients.
        
       | poisonta wrote:
       | I had mostly similar feelings four years ago. As I understood the
       | reasons behind them, I started respecting more the people at
       | HashiCorp. They are really smart.
        
       | nrvn wrote:
       | One of the biggest reasons that have kept me away from terraform
       | apart from the esoteric language is that terraform modules are
       | always a few steps being from the upstream public cloud
       | offerings.
       | 
       | In the sense that whenever there's a new API or service available
       | in any of public clouds and their official SDKs there will always
       | be a delay before this new service/feature/API will become
       | available in terraform.
       | 
       | First time I encountered it with GKE private clusters 3 or 4
       | years ago. Now it is AWS Keyspaces.
       | 
       | The second biggest reason is whenever you have a requirement for
       | a hybrid or multicloud then well you are left with rigidity if
       | HCL. It is probably doable but for what sake?
       | 
       | Solution: get a real language, write a STATELESS configuration
       | management(IaC) system for your own needs and maintain it. The
       | majority of public and private cloud providers ship SDKs in most
       | popular languages that will help you build your own software
       | solution and reduce your dependence on a third party which I
       | would put under progressing operational risk category.
       | Yaml/json/cue/toml for end user configs would suffice.
       | 
       | Example: for one of my previous projects were built a tool for a
       | hybrid AWS-openstack setup, and were managing a dozen of busy
       | environments.
        
         | oceanplexian wrote:
         | > Terraform modules are always a few steps being from the
         | upstream public cloud offerings.
         | 
         | My experience has been the exact opposite. Usually Terraform
         | offers support for cloud services long before the vendor
         | provides an SDK or supports it with their own offering (e.g.
         | Cloudformation). There are still dozens of AWS services, for
         | example that have no CF support offered by AWS.
        
         | nrvn wrote:
         | The emphasis on stateless here is that your desired state us
         | described in code that resides in your repository. Actual state
         | is what you have in your cloud. No need to spend time on state
         | format, storage and related logic and complexity
        
         | xyzzy_plugh wrote:
         | This is my preference as well. I've done everything from
         | makefiles and bash scripts to a monolith Go program that
         | statelessly provisions/tears down resources.
         | 
         | Even makefiles are pretty straightforward, though you really
         | want operations to only trigger when checksums differ --
         | timestamps result in a lot of redundant operations. As long as
         | everything is idempotent, it's pretty straightforward.
        
       | jounker wrote:
       | What countries have fewer deaths per 100K than Sweden? All of
       | Sweden's neighbors.
        
         | duskwuff wrote:
         | Wrong thread?
        
       | phendrenad2 wrote:
       | Terraform basically gives you what cloud providers should have.
       | AWS/Azure are these overcomplicated web interfaces, or
       | undocumented REST APIs, and Terraform gives you a simpler way to
       | configure stuff.
        
       | danw1979 wrote:
       | This article is not entirely incorrect, but there's some glaring
       | falsehoods as others have pointed out.
       | 
       | There's a plug for managing Helm related resources with the
       | author's own SaaS product at the end, so I'll file this under
       | "half hearted hit job".
        
       | cube2222 wrote:
       | Terraform definitely has it's warts, though, as other commenters
       | wrote, not everything in the article is true (the reconciliation
       | part): dependency resolution blows up in time as your number of
       | resources grows, so you need to split up your statefiles; it
       | can't passively listen for drift happening in a dataflow-like way
       | (that would be awesome); it's not transactional like
       | CloudFormation (which is more of a tradeoff, than a cons), and
       | more.
       | 
       | It is however a great improvement over the previous ways of doing
       | things, and probably the best out of the current similar
       | alternatives out there (you might mention Pulumi as a strong
       | contender, especially for AWS glue writing).
       | 
       | And - though as per the disclaimer, I may be biased - until a
       | better tool comes up, I'd advise looking for specialized IaC
       | CI/CD tools to ease your path with Terraform, like Spacelift[0].
       | 
       | It can help you with orchestrating dependencies among multiple
       | state files; take care of scheduling regular drift
       | detection/reconciliation without going into your way and locking
       | your state; gives you a policy system for making sure preventable
       | mistakes don't happen (i.e. recreating a resource you definitely
       | never want to recreate); manages your credentials depending on
       | whether you just want to run a plan, or apply your changes, and
       | much more.
       | 
       | I can't imagine doing Terraform again without a tool like it.
       | 
       | Disclaimer: Software Engineer at Spacelift. If you want me to
       | expand on the "and much more" part, you can find a demo-
       | scheduling link in my bio!
       | 
       | [0]: https://spacelift.io
        
       | 0xbadcafebee wrote:
       | Terraform is actually kind of a nightmare. It's deceptively
       | simple yet requires a massive amount of real world expertise to
       | use it properly. It's a configuration management tools, but more
       | difficult to use and extend.
       | 
       | I'm thinking of designing a series of tools to replace Terraform.
       | The idea would be to break down how modern cloud environments are
       | managed into a couple concepts, and then make a variety of tools
       | that work within those concepts together, so that it's easy to
       | expand and modify the way you use them for your use case. This
       | would enable things like tailoring the use of the tools to a
       | particular deployment strategy, or adding custom business logic,
       | or replacing individual functionality, without being tied to one
       | tool, language, etc.
        
       | kleinsch wrote:
       | Surprising that the author is writing a tool for managing your
       | servers, so writes a post about how Terraform isn't great at
       | managing your servers...
       | 
       | It seems like the root of the problem is that the author wants to
       | use Terraform to manage their AWS state, but also wants to use
       | the web console to directly change things, so Terraform gets out
       | of sync. Terraform has a command to handle this -
       | https://www.terraform.io/docs/cli/commands/refresh.html
        
         | throwaway894345 wrote:
         | As the sibling commenter notes, Terraform has a refresh flag,
         | but I wonder if Kubernetes' model is better here. Rather than a
         | one-off process that tries to update everything, Kubernetes has
         | many small controllers which are essentially processes on the
         | cluster that just run a control loop. Each controller
         | corresponds to one resource type, so it will just loop over
         | incoming events that pertain to the resource type in question
         | and attempt to reconcile the state of the target resource
         | instance with the desired state. If it fails initially, it will
         | retry with back off. If something doesn't stabilize after
         | several minutes, an alert can notify a human.
         | 
         | The key differences between the controller approach and the IaC
         | approach are, I think, lots of little processes continuously
         | reconciling state for all resources of a given type (many small
         | loops that touch all resources of a given type on the entire
         | cluster) versus a one-off process that tries to touch just the
         | resources it cares about and if it fails it just gives up.
         | 
         | One thing Kubernetes _definitely_ improves upon Terraform is
         | that Kubernetes uses a YAML "assembly language" for its infra
         | as code, but that YAML could be generated by a real programming
         | language. Terraform expects you to write HCL, which is an
         | accidentally re-invented programming language (every IaC tool
         | provider thought static configs like YAML would suffice as a
         | human interface, but as they gradually realized the need for
         | more dynamism, Terraform and others would bolt on one dynamic
         | feature after another until they had a slow, unfamiliar, and
         | counterintuitive programming language). Terraform has a CDK
         | that allows writing in other languages, but I'm skeptical that
         | it liberated you from Terraform's model of the world (e.g., if
         | I rename a variable in CDK, does it try to destroy and recreate
         | the underlying resource as with Terraform?). I'm also concerned
         | that rather than allowing us to generate YAML in the obvious
         | way, it will require bizarre inheritance patterns like the AWS
         | CDK. I would be curious to hear from folks who have used the
         | CDK.
        
           | the_duke wrote:
           | Terraform is also working on allowing actual code:
           | https://github.com/hashicorp/terraform-cdk
        
             | throwaway894345 wrote:
             | I know, I mention that in my comment :)
        
           | ClumsyPilot wrote:
           | "every IaC tool provider thought static configs like YAML
           | would suffice as a human interface, but as they gradually
           | realized the need for more dynamism, Terraform and others
           | would bolt on one dynamic feature after another until they
           | had a slow, unfamiliar, and counterintuitive programming
           | language"
           | 
           | I constantly see soo many people step on the same rake, it's
           | incredible. Tools like Tilt let you use python, it's a much
           | more sensible approach.
        
           | leg100 wrote:
           | The Hashicorp co-founders considered alternative approaches
           | when originally designing terraform. The actor model was
           | considered but dismissed. That's not a million miles away
           | from the kubernetes reconcile loop model:
           | 
           | > Then we transitioned to an actor-based model where each
           | resource was almost an actor, and there was a message-passing
           | interface between them.
           | 
           | > This allowed the system to be highly concurrent the way
           | Terraform is today, but also confusing for users to deal with
           | and very difficult to build a programming model around,
           | because the ordering of execution was so random and
           | everything was happening concurrently.
           | 
           | https://www.hashicorp.com/resources/terraform-fireside-
           | chat-...
           | 
           | They may still be right. Kubernetes' approach may seem more
           | attractive but terraform is far more pragmatic in its design.
        
           | verdverm wrote:
           | Terraform accepts JSON format as an alternative to HCL.
           | 
           | I prefer CUE to JSON for TF and many other tools now
        
             | throwaway894345 wrote:
             | To be clear, the issue isn't the HCL syntax. You could
             | similarly use Cue to generate HCL. The problem is using
             | Terraform's dynamic features which were poorly designed.
        
           | steveb wrote:
           | There are a lot of developments around using Kubernetes as an
           | IaC platform for the reasons in your comment. The combination
           | of a standard API model in CRDs + the controller model maps
           | nicely to managing infrastructure and exposing resources to
           | developers.
           | 
           | <https://crossplane.io> just graduated to CNCF Incubation and
           | each of the cloud providers are working on K8s controllers
           | and code generators (like Amazon Controllers for Kubernetes,
           | Google Config Connector, and the Azure service operator).
        
           | wernerb wrote:
           | The reasoning you put in also means kubernetes is unsuited to
           | be controlled by terraform. Too many lifecycles (resources)
           | to centrally control. kubernetes custom resources can have
           | dependencies on others which terraform either needs to
           | support as well. Which is not doable to maintain.. keep your
           | kubernetes manifests outside or your terraform state.
        
             | throwaway894345 wrote:
             | My company manages a lot of Kubernetes manifests with
             | Terraform without issue. Terraform is just generating the
             | manifests in this case; Kubernetes is doing the
             | reconciliation work. More complex than is ideal (i.e., if
             | we were starting out with Kubernetes we probably wouldn't
             | use Terraform) but it works reasonably well.
        
         | orf wrote:
         | There's also a refresh flag on plan. It's always worth using
         | before your apply CI step.
        
       | snom380 wrote:
       | > On terraform it's different, because of the tfstate. All the
       | deployed elements are stored in the tfstate, re-running terraform
       | won't update resources that are supposed to be in a specific
       | state but are not.
       | 
       | This is incorrect, and makes me wonder how the author has used
       | terraform. Terraform will certainly detect differences between
       | managed and current state for the resources it manages whenever
       | you do a plan/apply.
       | 
       | The major challenge is that terraform _can only reconcile
       | resources or configuration values it knows about_, and that
       | depends very much on how a particular cloud vendor or terraform
       | provider has modelled resources. I believe the Helm provider is
       | one example where it (at least in the past) haven't had a good
       | way to reconcile state.
        
         | forty wrote:
         | The stage when terraform read the reality to compare it to the
         | current state is called "refresh". It can optionally be
         | skipped.
        
         | shadycuz wrote:
         | I also agree, but I don't have experience with the providers he
         | is using.
         | 
         | When working with AWS it will always reapply the terraform
         | configuration.
         | 
         | A great use of this is for account hardening. You can run it
         | daily to make sure it's configured correctly.
        
           | OJFord wrote:
           | It would be even better if you could somehow tell it it
           | should have control over the entire account, so that anything
           | (entire resources I mean, not just changed properties)
           | created outside terraform would be destroyed.
           | 
           | In terms of API use though I suppose that'd be quite
           | expensive to plan - listing every possible AWS resource (in
           | every region!) for example.
        
             | joombaga wrote:
             | You'd have to go per-region first to avoid a major redesign
             | on the provider (the API is also per-region). I can see
             | something like a `controlled_resource_types` list attribute
             | on the provider that you could set to e.g. `[aws_instance]`
             | to inform the provider it needs to compare the list of
             | resources of the specified type to the state.
        
         | sciurus wrote:
         | Relatedly, I'm pretty sure the earlier statement
         | 
         | > For some resources like RDS or EKS, it won't check if the
         | resource already exists or not. So if it's missing, nothing is
         | going to happen as it's marked are deployed in the tfstate file
         | 
         | is also wrong.
        
           | joombaga wrote:
           | It is, at least for most RDS and EKS resources (RDS cluster,
           | cluster instance, parameter group, EKS cluster)
        
         | [deleted]
        
         | raffraffraff wrote:
         | There's also brokenness around terraform for_each and
         | providers. If you have a module that creates a Kubernetes
         | cluster and then applies a helm chart to it, you can't convert
         | it to one that takes a bunch of cluster definitions that it
         | iterates over using for_each. Basically, there is no way to do
         | this in a date driven way. Sucks.
        
         | robertlagrant wrote:
         | The author also seems to think DSL means "descriptive
         | languages" and that Helm "even" supports kubernetes, when in
         | fact it's only for that technology.
        
         | dead10ck wrote:
         | This stuck out to me too. Terraform absolutely does check the
         | current reality of the state and applies changes to do what the
         | HCL tells is to. They are either using terraform in a really
         | weird way, or this article was written by someone that doesn't
         | actually run terraform themselves.
        
           | jrochkind1 wrote:
           | Phew! I thought I was misunderstanding terraform when I saw
           | that.
           | 
           | Perhaps they are working with certain poorly implemented or
           | buggy providers, and not realizing those providers were doing
           | something different than terraform's properly working
           | behavior that most providers implemented.
           | 
           | Bugs happen, but the first step is agreeing on intended
           | behavior so we know what's a bug!
        
       | marcinzm wrote:
       | We use Terraform for cloud infrastructure and, basically, helm
       | deploys of external apps. In other words things that don't change
       | too often and the update of which has to be managed carefully
       | anyway. The internal apps which get deployed at a much faster
       | cadence use helm directly.
        
       | danielovichdk wrote:
       | I run only on Azure. Nearly all my IAC is written in PowerShell
       | uilizing the Azure Cli.
       | 
       | Terraform and yaml as well are so verbose and you have no clue
       | whats going on from the local side of things.
       | 
       | How do debug your terraform markup locally. My guess is you
       | can't.
        
         | awslattery wrote:
         | The console command [1], mixed with local variables, is quite
         | handy for debugging locally.
         | 
         | 1. https://www.terraform.io/docs/cli/commands/console.html
        
           | x86_64Ubuntu wrote:
           | Seems as if people have very strong feelings about Terraform,
           | but little actual experience using it.
        
             | toyg wrote:
             | It's not the easiest tool to grok, writing configuration
             | files can be very verbose, and its opinions about a number
             | of things can be very off-putting. I tried to "get into" TF
             | many times and never actually fell in love. I like the
             | idea, I like the cross-cloud approach, I just don't
             | particularly like what the experience ends up being. This
             | shit should be easier.
        
       | mvanaltvorst wrote:
       | I wish Terraform were less opinionated. It has a very clear set
       | of rules you have to adhere to, and if you try to do anything
       | remotely complex you will encounter barriers left and right.
       | 
       | An example is the fact that `for_each` is not supported on
       | providers [1], an issue with 230 likes which has not been solved
       | since January 2019. This had me resort to a Python script which
       | generates a `.tf.json` file, definitely not ideal. Infrastructure
       | as code sounds great, but in practice it's closer to
       | "infrastructure as a non-standard markup language".
       | 
       | [1]: https://github.com/hashicorp/terraform/issues/19932
        
         | throwaway894345 wrote:
         | You have to understand that when IaC was new, the marketing was
         | "it's so simple you can just write YAML/JSON/etc" because
         | frankly the industry was too dumb to understand that "using a
         | real programming language to _generate a description of the
         | desired resource state_ " and "using a real programming
         | language to imperatively reconcile the current and desired
         | states oneself" are different things. So Terraform began with
         | something that resembled YAML in its static-ness, and over
         | time, more power was required so they would bolt on a dynamic
         | feature but were reluctant to give the impression that they
         | were building a programming language so the feature would be as
         | obscure as possible. But that wouldn't be enough either so they
         | would add still more dynamic features, each comparably obscure
         | until in time they'd built a complete, obscure programming
         | language.
         | 
         | But this wasn't just Terraform! The entire industry did this
         | too. CloudFormation began as simple JSON, but over time they
         | allowed you to encode the abstract syntax tree of a shitty
         | programming language in your YAML, and CloudFormation would
         | interpret it. However stupid that may sound, in the Kubernetes
         | world, we have Helm which lets you generate YAML with _text
         | templates_ which is honestly the dumbest idea in the world
         | (imagine a compiler that generates syntactically invalid
         | machine code if the input program has an extra white space
         | character).
         | 
         | Of course in all of these cases the answer is staring us in the
         | face: use a static language (YAML, JSON, etc) to _describe_ the
         | desired state, and use a higher level language (like Python or
         | Starlark or Dhall or etc) to _generate_ that static desired
         | state description. The only thing Terraform (or any IaC tool)
         | should care about is the YAML description. That it is generated
         | from Starlark or TypeScript is just an implementation detail.
         | 
         | Instead of that, though, we get CDKs which are _so close_ , but
         | admittedly I haven't used them in anger yet.
        
           | x3n0ph3n3 wrote:
           | One of the best parts of CloudFormation was their
           | introduction of Macros. You can take either your whole
           | template or just a snippet, and perform dynamic
           | transformations by calling a lambda. I'll admit I've gone so
           | far as being able to embed ERB (Ruby) into my templates in
           | order to more dynamically define some resources based on
           | stack parameters. I can also create N resources with common
           | configuration based on the values of a CommaDelimitedList.
        
             | throwaway894345 wrote:
             | I think the idea here is that macros are neat in any
             | language, but in CloudFormation they can help automate
             | stuff that is only difficult because of CloudFormation, and
             | the macros themselves are harder to use than those in a
             | normal programming language. In all cases, I think it's
             | strictly less nice than generating your CloudFormation YAML
             | with Python or similar.
        
         | kevincox wrote:
         | I think this is less that it is opinionated but more that the
         | HCL evaluation feels like a pile of hacks. There are unclear
         | rules on what can be evaluated when and what dependencies are
         | possible. Part of this is so that `plan` can work as it does,
         | but it seems like there are just major gaps in general. For
         | example providers can't depend on resources. This makes it very
         | difficult to for example set up EKS then use the kubernetes
         | provider to manage the resources in the cluster. The solution
         | is obviously separate stacks but that brings in a whole bunch
         | of other problems.
         | 
         | I think Terraform is quite possibly the best tool available,
         | but there are clear flaws with both the model and the
         | implementation. I think if I were to make a Terraform v2 I
         | would make `plan` completely pure. This would avoid the
         | provider issues, make validation and testing in CI easier and a
         | whole bunch of other benefits. Of course there are downsides.
         | For example EC2 instance IDs are random so you can't just
         | include them in your pure plan. You would need some type of
         | placeholder that is used for evaluation. This does cause some
         | issues as it limits the operations that you can do with that
         | value (so you can't pick the instance size based on the random
         | instance ID) but overall I don't think it would be a major
         | issue if the final substitution was handled well by the
         | framework.
        
       | wayoutthere wrote:
       | The new use case for Terraform is IT departments using it as an
       | IaC policy control system for self-service. Rather than push
       | teams through a web interface, you just expose Terraform via
       | Terragrunt to the dev teams and run their files through a policy-
       | driven linter before executing. And make it so that Terraform is
       | the _only_ way you can push to prod.
       | 
       | I think people get into trouble with Terraform when they try to
       | use it to do more than provision infrastructure. Things that
       | should probably be part of build scripts, CI/CD pipelines or
       | config management. Terraform isn't good at those things; but it
       | is very good at provisioning cloud infra in a cloud-agnostic
       | fashion.
        
         | pvtmert wrote:
         | I agree most of your comment but the cloud-agnostic part.
         | 
         | Heck, I never get how terraform can be cloud-agnostic... If
         | everybody thinks having a same language (HCL) is equalivent to
         | cloud agnostic, YAML exists...
         | 
         | It is literrally impossible to create a simple VM in 2
         | different cloud providers without defining them twice with
         | their own specific parameters.
         | 
         | If you use AWS provider, resources start with aws_, if it's GCP
         | it starts with gcp_ and so on. It is not possible to have a
         | "resource vm { name = ... provider = aws }"
        
           | jen20 wrote:
           | Terraform provides the same _workflow_ across clouds, not the
           | same resource model (which would be dumb, since it would
           | necessarily provide only a lowest common denominator
           | representation).
           | 
           | > It is not possible to have a "resource vm { name = ...
           | provider = aws }"
           | 
           | It actually is via modules. It's a lot of work with basically
           | no benefit though, so in practice people don't do this.
           | 
           | Pulumi is better at specifically this kind of thing however,
           | since you can implement a common interface which can be
           | specialised for each available cloud.
        
       | airocker wrote:
       | We use this exact stack, but we generally would not rely on
       | tfstate. We will remove everything and regenerate it. It's not an
       | operation done too frequently that we have a big problem. Also,
       | helm we use as a separate layer that is applied after terraform
       | and can be repeated many times. This changes often.
        
       | jdub wrote:
       | My biggest issue with Terraform is the impedance mismatch between
       | HCL and the rest of the known universe.
       | 
       | When you write a provider, you spend half your time converting
       | data structures from the HCL submitted to your provider into the
       | JSON your target service inevitably expects, then you spend half
       | your time converting the JSON your target service inevitably
       | returns into HCL for Terraform to consume, and then you spend
       | another half of your time fixing bugs and polishing.
       | 
       | It's okay when you're building simple providers, but anything
       | reasonably complex becomes unwieldy. I had a go at building some
       | providers for AWS services that were not supported by Terraform
       | or CloudFormation... and I just retreated to cheesy Lambda custom
       | resources for CloudFormation.
        
         | gtirloni wrote:
         | The upside though is that you're adapting that known universe
         | to a way of working that makes sense for terraform _users_.
        
           | ClumsyPilot wrote:
           | Good tools are fit for the world we live in, bad tools
           | require you to "adapt the the known universe" to the tool
        
           | jdub wrote:
           | But I wanted to be a Terraform user. So an alternative
           | interpretation is that, as designed, Terraform is slowing
           | down adaptation of the known universe for Terraform users.
        
           | throwaway894345 wrote:
           | Unfortunately HCL is also an unnecessary learning curve for
           | users. I still don't have my head around a list versus a
           | bunch of blocks with the same name, for example. I originally
           | thought it was syntax sugar for a single mechanism, but I've
           | had errors for trying to use one instead of the other before.
        
         | kubanczyk wrote:
         | I read you. The half of the Terraform's value today is "let's
         | have common aesthetics - especially the examples and
         | snake_case_naming_convention - and convert _any_ REST API in
         | the world to that ".
         | 
         | The typical example is to start from there:
         | 
         | https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_V...
         | 
         | and arrive here:
         | 
         | https://registry.terraform.io/providers/hashicorp/aws/latest...
         | 
         | If you review it carefully, it is apparent how much coding
         | effort and many moving parts were used to perform a
         | transformation which seems disproportionately primitive.
        
         | reilly3000 wrote:
         | I'm partial to Pulumi for this reason. It allows devs to use
         | familiar languages to define infrastructure with their familiar
         | tools, write tests, and even interop with existing terraform.
        
       | dharmab wrote:
       | An alternative opinion: When I worked at a large tech company, my
       | team made a conscious decision to not use terraform. This gave us
       | some key advantages- we are able to adopt new cloud features
       | immediately, months before they were available in tf, and our
       | direct cloud access let us build features that would surprise the
       | teams using tf within the company.
       | 
       | If your core competency isn't dependent on your cloud platform tf
       | is a great tool. But using cloud APIs directly was great for us.
        
         | jrsdav wrote:
         | > But using cloud APIs directly was great for us
         | 
         | This is fine, I've done it extensively myself for some of the
         | bleeding-edge cloud stuff, but the importance of things like
         | tracking state, managing hierarchical resource dependencies, or
         | retry/back-off logic shouldn't be tossed aside simply because
         | there are gaps in what's available in the Terraform providers.
         | Especially where change management is important (basically any
         | enterprise company).
         | 
         | I'd caution others reading this against abandoning something
         | altogether and writing bespoke IaC tooling simply because the
         | stable approach doesn't cover every (bleeding) edge case.
         | 
         | You'll spend a lot of time reinventing the wheel, and while
         | it's fine for certain situations (like when you only care about
         | desired state, not known state, for instance), you'll move
         | faster (and likely safer) by sticking with tools like Terraform
         | for the bulk of your infra, and augmenting here there with
         | cloud APIs/SDKs when needed.
        
           | dharmab wrote:
           | Yes, we did have to implement our own state tracking,
           | retries/recovery, etc- but since we were focused on a limited
           | subset of the cloud API, this was pretty easy.
        
             | Thaxll wrote:
             | So you re-implemented terraform but worse most likely. Also
             | you could have added those missing features and re-use the
             | TF engine, it's very simple to include new API of an
             | existing provider.
        
         | goodpoint wrote:
         | Same here. It's incredible how much efforts developers are
         | willing to put on popular "devops" tools when the job could be
         | done faster with 200 lines of Python.
        
         | iddqd wrote:
         | It usually takes 1-2 years for AWS to roll out their latest
         | updates to the regions I use and by then Terraform is stable.
        
         | oneplane wrote:
         | Would the time spent re-implementing a specialised Terraform
         | subset be better spent simply maintaining a private branch of
         | the AWS Provider? You can add your secret/special API without
         | having to do all the other heavy lifting as well.
         | 
         | This makes your own effort for customisation minimal, keeps
         | your knowledge portable and because your added features can be
         | separated in to different files and the provider API is stable
         | you can also easily backport/fast-forward new changes.
        
         | CSDude wrote:
         | What new cloud features was not available in Terraform for
         | months?
        
           | dharmab wrote:
           | Private preview features for partnered organizations. We had
           | access up to 6 months before the public.
        
           | pvtmert wrote:
           | eg. AWS ChatBot not available in TF yet. TBH AWS haven't even
           | added it to their Go SDK. So, I cannot blame TF. But anyway
           | that's one of the inheret problems of TF plugin system.
           | 
           | Compare to kubectl. Where you can write plugins in bash/shell
           | and mark with execute bit, put it in somewhere in your $PATH
           | as kubectl-blabla and use it as "kubectl blabla".
        
             | CSDude wrote:
             | It's not fair to compare imperative simple shell scripts
             | with the things Terraform does. It has schema validation,
             | state comparison, retries, failure handlers etc.
             | 
             | Also, just as you can write extensions to kubectl, you can
             | write your own provider in Terraform if it does not exists.
             | See https://registry.terraform.io/modules/waveaccounting/ch
             | atbot...
             | 
             | Also, Chatbot does not have a public API, that's why, it's
             | only configured via Cloudformation. So the expectation is
             | not fair either.
             | 
             | I've seen Cloudformation getting features years later. i.e
             | 
             | 2021 - https://aws.amazon.com/about-aws/whats-
             | new/2021/05/amazon-dy... 2015 -
             | https://aws.amazon.com/about-aws/whats-new/2015/07/amazon-
             | dy...
        
               | jen20 wrote:
               | NAT Gateways is another notable feature that took
               | CloudFormation months yet Terraform had on day 1.
               | 
               | If you can configure something via CloudFormation you can
               | integrate it via Terraform et al also, since they have
               | resources representing CloudFormation stacks.
        
               | lincler wrote:
               | This! Is not like you can't go beyond what Terraform
               | offers by default. Running CloudFormation stacks from
               | Terraform is a neat way of solving missing
               | apis/integration. And that's exactly what my team did
               | when Terraform was missing a lot of lambdas
               | functionalities. We just declared the CloudFormation
               | Stack for lambdas and then call it from Terraform.
        
             | jen20 wrote:
             | There's no reason you can't do something similar with
             | Terraform either - plugins speak GRPC and thus could be
             | implemented in Python, with Node.js or with Rust.
             | 
             | However, if AWS have not published metadata for a given
             | service to be used across their various SDKs, it's hard to
             | take that service particularly seriously, so I'm not sure
             | I'd bother with this.
        
       | bovermyer wrote:
       | From the problems the author is having, it would appear that
       | perhaps Pulumi would be the better choice in this case.
        
         | rgoulter wrote:
         | In the article, the problems the author discusses are:
         | 
         | 1. Inconsistent behaviour between providers. e.g. if a resource
         | has been destroyed since the last `terraform apply`, then some
         | resources/providers would recreate the resource, others
         | wouldn't. (Similarly, there's not a guarantee that the state
         | after running `terraform apply` matches up with what's there,
         | if the provider is happy with its state file).
         | 
         | 2. The dependencies of already-applied resources can block
         | `terraform apply` if the upstream API for these resources
         | suffers an outage.
         | 
         | 3. If a `terraform apply` applies some resources before
         | failing, this can result in an inconsistent state. Either the
         | resources need to be deleted, or imported.
         | 
         | I'm not familiar with Pulumi; what aspects of these would
         | Pulumi help with?
        
           | kall wrote:
           | 3 happens regularly and I don't see how the other two would
           | really be different, since some pulumi providers are using
           | terraform providers under the hood.
        
       | p2t2p wrote:
       | My experience is that terraform sucks. A lot. Yet everything else
       | seem to suck even more
        
       | jrochkind1 wrote:
       | > On terraform it's different, because of the tfstate. All the
       | deployed elements are stored in the tfstate, re-running terraform
       | won't update resources that are supposed to be in a specific
       | state but are not.
       | 
       | Huh, is that true?
       | 
       | I'm just getting started with terraform but I assumed that was
       | the idea of terraform (where it didn't happen would be a bug),
       | and I think I had seen it happening for the few basic resources I
       | have started out with (S3, cloudfront).
       | 
       | If the state doesn't match the actual configuration of S3,
       | terraform notices, and the plan is to make it so. No? Am I
       | confused and it hasn't been doing this?
       | 
       | Or is this is inconsistent, true of some resources and not of
       | others? That seems surprising. What's the idea?
        
         | snom380 wrote:
         | It's not true, and what you describe is correct.
         | 
         | I can only assume that the author has used some provider that
         | hasn't implemented this properly (helm, I believe, is one
         | example), or that they've run into one of the cases where
         | terraform treats configuration/attachments to a resource as a
         | separate resource (e.g. IAM role vs attached/inline IAM
         | policies).
        
         | shatteredspace wrote:
         | If Terraform was used for the deployment of the infrastructure,
         | then state IS the actual configuration of the system.
         | 
         | All that a plan does is evaluate what is going to change in the
         | current Terraform state by performing a dry-run of the
         | Terraform code that you have supplied.
         | 
         | If you would actually like to make changes to the Terraform
         | state based on what the Terraform code evaluated then you run a
         | Terraform apply - which will, for the resources deployed via
         | Terraform, update the configurations themselves and update the
         | Terraform state by using the Terraform code as the instruction
         | set.
         | 
         | You can actually see this in action with plan and apply as the
         | output will show you +,-, and ~ where ~ is settings that are
         | going to change but are not new configurations or configuration
         | to be removed.
         | 
         | Edit: Learned from some other comments that Terraform has a
         | 'refresh' command that will take deploy+n time configurations
         | done outside of Terraform and sync those configurations with
         | the state. This might be what you ideally are looking for after
         | deployments?
        
           | jrochkind1 wrote:
           | Right. I guess I'm asking about what happens if state changed
           | outside of terraform.
           | 
           | I thought I had seen terraform correcting it (to match what
           | terraform thinks it should be) in some cases.
           | 
           | OP seems to suggest that in some cases it does and in other
           | cases it doens't. I am surprised if that is inconsistent and
           | unpredictable, and would have expected terraform to (modulo
           | bugs) either always or never do that. And am wondering what
           | terraform's intent is with that.
        
             | shatteredspace wrote:
             | Made an edit to my original comment but it may help to be
             | here also.
             | 
             | 'terraform refresh' may be what you are looking for. This
             | will update the state to match current configurations that
             | may have been done outside of Terraform.
        
               | snom380 wrote:
               | From
               | https://www.terraform.io/docs/cli/commands/refresh.html :
               | 
               | > You shouldn't typically need to use this command,
               | because Terraform automatically performs the same
               | refreshing actions as a part of creating a plan in both
               | the terraform plan and terraform apply commands.
        
               | jrochkind1 wrote:
               | I'm talking about fixing the external actual
               | configuration that has diverged, to match what terraform
               | config wants it to be.
               | 
               | However, snom380 says this is what terraform is intended
               | to do, and does with properly implemented providers,
               | which makes sense to me.
               | 
               | I'm not sure if you are talking about the same things. We
               | -- me, snom380, and the original quote I made from OP --
               | are talking about what happens when external actually
               | existing live resources have diverged from terraform's
               | state. I understand what you said that under a "perfect"
               | situation, this would not happen. But it does sometimes
               | for various reasons, what the original quote from OP is
               | talking about and what I'm talking about is what happens
               | when it does. I think maybe you are talking about
               | something else.
        
       ___________________________________________________________________
       (page generated 2021-09-19 23:02 UTC)