[HN Gopher] Terraform is not the golden hammer
___________________________________________________________________
Terraform is not the golden hammer
Author : kiyanwang
Score : 134 points
Date : 2021-09-19 10:44 UTC (12 hours ago)
(HTM) web link (hub.qovery.com)
(TXT) w3m dump (hub.qovery.com)
| d0gsg0w00f wrote:
| IME, Terraform is great for "fire and forget" deployments.
| However, if you're trying to use it to continously bring older
| existing deployed infra up to date it can get tricky. I strongly
| suggest self versioning your TF files and following strict
| pretested upgrade paths.
| robertlagrant wrote:
| If you just use a cloud provider's UI you will have no
| separation between desired and actual states. Then, whatever
| people change it to is the only truth.
| nanis wrote:
| > Terraform is great for "fire and forget" deployments.
|
| Indeed, terraform is great for exactly the opposite: For making
| sure both the initial infrastructure and subsequent
| modifications to it are first proposed, discussed, reviewed in
| code and applied afterwards instead of anyone with privileges
| tinkering with settings on an "as needed" basis in whatever
| console thereby ending up with an infrastructure where you have
| no idea who turned on the frobnicator, why it was set to 11,
| and what might be the consequences of changing the setting.
| d0gsg0w00f wrote:
| We experienced a lot of problems with when trying to manage
| standard configurations of hundreds of AWS VPC's ranging in
| age from 3 days old to 6 years. Accounts built years ago
| using older terraform would have to be handled completely
| differently because inevitable TF template drift and TF
| versions made each upgrade path unique. Not insurmountable by
| any means but also not trivial. Just sharing my experience.
| 3np wrote:
| The author seems to have some misunderstandings on how Terraform
| is supposed to work - you should get the "automatic
| reconciliation" they're saying is missing. Also,
|
| > I run once again the "terraform apply" command. But for some
| reason, Cloudflare API doesn't answer and I got completely stuck
| there without the possibility to update with Terraform this field
| because of linked dependencies.
|
| You should be able to circumvent this with a `-target`.
|
| That being said, I know exactly what they're talking about with
| helm. IME the helm provider was/is a complete mess and gets
| inconsistent state a lot. Helm specifically I would also keep out
| of TF until that is fixed, if ever. I haven't had that happen
| with other providers, though. Perhaps OP was just really unlucky
| ending up with the odd half-broken AWS module.
| pojzon wrote:
| I don't have issue with terraform as it has a very clear defined
| usecases.
|
| I have issues with all the providers that make no sense like
| application configuration providers or all the flavors of kubectl
| providers..
|
| Those are often very low quality and have various issues
| dedicated solutions don't have.
|
| An example could be helm and helm_provider. The former just
| works, with the latter Im constantly running into weird bugs that
| break terraform state..
| digianarchist wrote:
| You can run terraform apply against a particular resource which
| will only provision that resource and its dependencies.
|
| tfstate files can be painful to manage, we had a lot of trouble
| with them at Capital One but mostly because:
|
| 1. People would modify state outside of TF which you should
| avoid.
|
| 2. People didn't architect their apps well which led to long
| lived infra. TF works best with cattle like infra.
|
| terraform import feels very much like an afterthought which is
| why projects like terraformer exist.
| picardo wrote:
| Terraform can be frustratingly slow at times. You have to realize
| that at the end of the day, it's an abstraction layer on top of
| the public APIs of a cloud service. If all your services are
| hosted on a single cloud, you don't need Terraform.
|
| We saw a huge improvement in our build times after we started
| using AWS CDK directly.
| amarshall wrote:
| What does the AWA CDK do differently than use public APIs?
| scrollaway wrote:
| CDK generates cloudformation stacks. Those stacks are
| deployed as units, within AWS itself. AWS treats all the
| resources as part of that stack etc; it's a concept entirely
| proper to AWS.
|
| Terraform can create cloudformation stacks as well, you just
| have to write the resources for it. It doesn't really make
| sense to do that. I also don't know that it's ... "faster" in
| any way; cfn is really slow.
| orf wrote:
| It's just tags on the resources and a managed statefile.
| There isn't anything different between a bucket created via
| CDK and a bucket created via terraform, the resources are
| the same and the API calls to create them are also the
| same.
| kenerwin88 wrote:
| Hmm, as a former AWS employee who has used both heavily, my
| experience has been the opposite.
|
| Terraform's AWS provider calls the APIs directly, whereas CDK
| generates Cloudformation, an abstraction on top of the AWS
| APIs. For me, using Terraform was significantly faster than
| applying the same stack via CDK.
|
| Or do you mean you're able to iterate faster writing CDK vs TF?
| picardo wrote:
| Thinking back on it, we always used Terraform with Pulumi,
| which creates its own abstraction layer for a CF stack. It's
| hard to pinpoint where the root cause of the slowness was..
| but in principle having fewer abstractions allowed us to
| iterate faster, and fix the bugs more quickly.
| snom380 wrote:
| From my experience, terraform is almost always slow because
| it's making API calls out to the cloud providers, and a lot of
| that in turn is slow because many providers offer "eventually
| consistent" which terraform needs to compensate for by doing
| roundtrips to validate that changes are have become visible
| (applys failing because of that was a common problem in the
| early days of terraform).
| ulzeraj wrote:
| I'd rather devote my time learning an agnostic tool like
| terraform. I can be part of your team right now working on AWS
| but tomorrow I might be working for an Azure shop.
| qaq wrote:
| CDK synth is reasonably fast but CF is really slow
| gjhr wrote:
| > When you run Terraform against AWS on the subnets part, it will
| create (anytime you deploy) the missing subnets
|
| That is one of the core features of Terraform? Detecting and
| fixing drift is useful.
| NKosmatos wrote:
| Came here to read about planet terraforming but instead I learned
| about (yet another) cloud deployment tool :-)
| scrollaway wrote:
| Ah, congrats on being part of today's ten thousand.
|
| I very highly recommend investigating it more and trying it a
| bit. Terraform isn't mere cloud deployment.
|
| As a small project you can start by deploying an EC2, RDS and
| some cloudflare records to go with them, all linked together
| with terraform. This will give you an initial idea of its
| capabilities.
| rad_gruchalski wrote:
| > learned about (yet another) cloud deployment tool
|
| Please don't read it as an attack, not the intent. Amazing that
| a HN regular can "learn about Terraform" only in the latter
| part of 2021!
| jhgb wrote:
| I made the same mistake. First hearing about this. It turns
| out that different people may have different backgrounds! ;)
| johannes1234321 wrote:
| https://xkcd.com/1053/
|
| I assume there is some selective reading and most articles
| referring to Terraform have a cloud or hashicorp reference in
| the title. If you don't care about either, you don't read the
| Terraform things on HN.
| nojvek wrote:
| Yeah the tfstate was a big gotcha. Pulumi has the same problems.
|
| What we really need is automatic reconciliation. I.e ask the
| provider what they have and then diff against that.
|
| Or periodically auto-importing.
|
| Are there any good solutions to auto-importing?
| robertlagrant wrote:
| What's the point of reimporting? Aren't you then no longer
| separating desired state from actual?
| kall wrote:
| Hm, maybe what we really need is a new entrant in hyperscale
| clouds that is built from the ground up for IaC and just does
| away with the split between state and reality. I would love to
| see one, anyway.
| arpinum wrote:
| The hard part with auto-importing is the resource id. This is
| usually generated server-side and is not included in the hcl.
| Often resources define a user-supplied identifier as well, such
| as a name property, and this could be used for auto-importing
| if the property has unique constraints applied against it.
| However, not all resources have this feature, so its not a
| universal solution.
| scrollaway wrote:
| It's very difficult to communicate what Terraform's strengths and
| weaknesses are to someone who's never used it or IAC in general.
|
| Spend enough time playing with it and understanding it, you'll
| end up like me thinking about all the shit you configure left and
| right such as hooking up Stripe's secret keys, Google Analytics
| and the webmaster console, and just about everything else we
| configure via web interfaces, and you'll think:
|
| Why can't we use Terraform for this as well? Manage these SaaS
| products the same way we manage the rest of our cloud, test and
| audit changes, automatically roll secrets and update anything
| that needs updating the moment you change a setting.
|
| Ah well. Not enough APIs out there. And it's difficult to write
| and maintain terraform plugins for these throwaway cases
| especially if they are going to use private APIs. Anyone know if
| Pulumi plugins are easier to write?
| brightball wrote:
| IMO, this is an area where I think Terraform + Ansible pairs so
| well together.
|
| If there's ever a gap in what Terraform offers you can pretty
| easily fill it with Ansible.
| weitzj wrote:
| And this setup is also encouraged by HashiCorp (at least I
| saw a talk by them). Use ansible for your "smart" sequential
| executions and Terraform as a sane wrapper for state.
| chucky_z wrote:
| I've found running Terraform via Ansible to be a pretty
| good experience.
| weitzj wrote:
| So the other way around. Do you employ a GitOps approach
| this way?
|
| I find it hard to figure out how to use GitOps with
| Ansible? How do you make a PullRequest which indicates
| that something should get deleted? You still would have
| to keep around an ansible playbook for the stuff you want
| to delete.
| Octabrain wrote:
| I've seen (and fixed) so many ugly messes at this point
| made as a result of mixing and wrapping tools with
| different purposes together like Ansible + Terraform that
| it's something I strongly discourage. I recommend to keep
| the boundaries and responsibilities of the tools clear. In
| this case, Terraform for the creation of resources and
| Ansible for the configuration of those resources. In my
| opinion, this gives as a result a much simpler and
| maintainable ecosystem in the long run.
| [deleted]
| pram wrote:
| Ansible is also amazing with Packer as the provisioner.
| kall wrote:
| Your intuition is right on pulumi. Creating these kinds of
| extensions is minimal effort. You can start out by adding a
| "dynamic resource" class to your infra codebase and extract it
| into a plugin later, or not.
|
| These are not the same as "real" pulumi providers that run
| across all supported programming languages, but I think they
| are a good enough fit for the cases you mention.
| crabmusket wrote:
| I've used https://github.com/Mastercard/terraform-provider-
| restapi successfully with a cloud provider which provides a
| suitable HTTP API. There was a bit of fiddling with JSON
| formatting and their API docs, but it wasn't too hard all in
| all.
|
| But like you say - now I've done that, I want to do it for
| every UI that I'm forced to log in to!
| weitzj wrote:
| Yes. We use the Rest Api Provider extensively to provision
| ElasticSearch and Kibana.
| MuffinFlavored wrote:
| what's the backing database for this, the .tfstste file? do
| the resources/secrets you create end up getting "backed up"
| (committed) to git?
| scrollaway wrote:
| Wow, neat, thanks for the link. Maintained by Mastercard, eh?
| Anyone else on HN used this or worked on it?
| mason55 wrote:
| This is how I feel about any kind of configuration after moving
| all my personal systems to NixOS.
|
| "What do you mean run an installer and update these files..."
| satya71 wrote:
| Pulumi has dynamic providers. You define the crud operations
| and Pulumi manage the state.
| reilly3000 wrote:
| After a little grokking it's a surprisingly easy way to
| manage arbitrary API resources. They are just functions in
| your language of choice and can accept parameters from any
| dependency. It's also possible to roll your own provider, but
| dynamic providers cover all kinds of use cases.
| [deleted]
| vasco wrote:
| I wish I could use it for everything as well. Every other tool
| the business depends on just scares me to shit that it's config
| isn't in code and we can't have a proper backup of it. At least
| Datadog supports terraform for most things and not only we
| manage all our infra through terraform we manage all our
| monitoring with it too. I doubt very much I'll ever go back to
| non-monitoring-as-code if that's even a term.
|
| Infrastructure, all the monitoring as well as all the on-call
| rotation configurations (and anything else that is in that
| loop) should all be in code, and all changes should be reviewed
| the same way as application code does. If it doesn't, you can't
| really trust you're gonna be alerted properly when things start
| breaking.
|
| I wish I could use it for personal things too, I'd rather have
| my bank account settings, my government tax information, yada
| yada in a personal terraform repository for example. Change of
| address? Commit a a change, check if the plan is good and apply
| to change it everywhere. Though having lots of experience with
| Terraform I can only imagine what the equivalent of trying to
| delete an S3 bucket that still has data in it is for a bank
| account.
| OJFord wrote:
| I so agree; I tried/am trying to write an Android provider -
| currently just have app (un)installation working, and not
| very well, I expected settings management to be the hard
| part, but egh.
|
| Why can't everything have nice public APIs! And, while I'm at
| it, some sort of all-encompassing ticketing system, hell even
| if it were Jira. 'Pothole', assign local council. Blocked on
| '2021 roadworks funding increase', backlogged. Assigned to
| councillor. Won't fix. Ok - maybe I'm not making it sound
| _great_ , but at least you could see some reasoning, and what
| the blockers are. Follow the chain to work out that 'communal
| lobby needs repainting' hasn't been acted kn by building
| management company because, ultimately, of global supply
| chain disruption and the contractor's supplier's supplier
| can't get any paint ingredients.
| poisonta wrote:
| I had mostly similar feelings four years ago. As I understood the
| reasons behind them, I started respecting more the people at
| HashiCorp. They are really smart.
| nrvn wrote:
| One of the biggest reasons that have kept me away from terraform
| apart from the esoteric language is that terraform modules are
| always a few steps being from the upstream public cloud
| offerings.
|
| In the sense that whenever there's a new API or service available
| in any of public clouds and their official SDKs there will always
| be a delay before this new service/feature/API will become
| available in terraform.
|
| First time I encountered it with GKE private clusters 3 or 4
| years ago. Now it is AWS Keyspaces.
|
| The second biggest reason is whenever you have a requirement for
| a hybrid or multicloud then well you are left with rigidity if
| HCL. It is probably doable but for what sake?
|
| Solution: get a real language, write a STATELESS configuration
| management(IaC) system for your own needs and maintain it. The
| majority of public and private cloud providers ship SDKs in most
| popular languages that will help you build your own software
| solution and reduce your dependence on a third party which I
| would put under progressing operational risk category.
| Yaml/json/cue/toml for end user configs would suffice.
|
| Example: for one of my previous projects were built a tool for a
| hybrid AWS-openstack setup, and were managing a dozen of busy
| environments.
| oceanplexian wrote:
| > Terraform modules are always a few steps being from the
| upstream public cloud offerings.
|
| My experience has been the exact opposite. Usually Terraform
| offers support for cloud services long before the vendor
| provides an SDK or supports it with their own offering (e.g.
| Cloudformation). There are still dozens of AWS services, for
| example that have no CF support offered by AWS.
| nrvn wrote:
| The emphasis on stateless here is that your desired state us
| described in code that resides in your repository. Actual state
| is what you have in your cloud. No need to spend time on state
| format, storage and related logic and complexity
| xyzzy_plugh wrote:
| This is my preference as well. I've done everything from
| makefiles and bash scripts to a monolith Go program that
| statelessly provisions/tears down resources.
|
| Even makefiles are pretty straightforward, though you really
| want operations to only trigger when checksums differ --
| timestamps result in a lot of redundant operations. As long as
| everything is idempotent, it's pretty straightforward.
| jounker wrote:
| What countries have fewer deaths per 100K than Sweden? All of
| Sweden's neighbors.
| duskwuff wrote:
| Wrong thread?
| phendrenad2 wrote:
| Terraform basically gives you what cloud providers should have.
| AWS/Azure are these overcomplicated web interfaces, or
| undocumented REST APIs, and Terraform gives you a simpler way to
| configure stuff.
| danw1979 wrote:
| This article is not entirely incorrect, but there's some glaring
| falsehoods as others have pointed out.
|
| There's a plug for managing Helm related resources with the
| author's own SaaS product at the end, so I'll file this under
| "half hearted hit job".
| cube2222 wrote:
| Terraform definitely has it's warts, though, as other commenters
| wrote, not everything in the article is true (the reconciliation
| part): dependency resolution blows up in time as your number of
| resources grows, so you need to split up your statefiles; it
| can't passively listen for drift happening in a dataflow-like way
| (that would be awesome); it's not transactional like
| CloudFormation (which is more of a tradeoff, than a cons), and
| more.
|
| It is however a great improvement over the previous ways of doing
| things, and probably the best out of the current similar
| alternatives out there (you might mention Pulumi as a strong
| contender, especially for AWS glue writing).
|
| And - though as per the disclaimer, I may be biased - until a
| better tool comes up, I'd advise looking for specialized IaC
| CI/CD tools to ease your path with Terraform, like Spacelift[0].
|
| It can help you with orchestrating dependencies among multiple
| state files; take care of scheduling regular drift
| detection/reconciliation without going into your way and locking
| your state; gives you a policy system for making sure preventable
| mistakes don't happen (i.e. recreating a resource you definitely
| never want to recreate); manages your credentials depending on
| whether you just want to run a plan, or apply your changes, and
| much more.
|
| I can't imagine doing Terraform again without a tool like it.
|
| Disclaimer: Software Engineer at Spacelift. If you want me to
| expand on the "and much more" part, you can find a demo-
| scheduling link in my bio!
|
| [0]: https://spacelift.io
| 0xbadcafebee wrote:
| Terraform is actually kind of a nightmare. It's deceptively
| simple yet requires a massive amount of real world expertise to
| use it properly. It's a configuration management tools, but more
| difficult to use and extend.
|
| I'm thinking of designing a series of tools to replace Terraform.
| The idea would be to break down how modern cloud environments are
| managed into a couple concepts, and then make a variety of tools
| that work within those concepts together, so that it's easy to
| expand and modify the way you use them for your use case. This
| would enable things like tailoring the use of the tools to a
| particular deployment strategy, or adding custom business logic,
| or replacing individual functionality, without being tied to one
| tool, language, etc.
| kleinsch wrote:
| Surprising that the author is writing a tool for managing your
| servers, so writes a post about how Terraform isn't great at
| managing your servers...
|
| It seems like the root of the problem is that the author wants to
| use Terraform to manage their AWS state, but also wants to use
| the web console to directly change things, so Terraform gets out
| of sync. Terraform has a command to handle this -
| https://www.terraform.io/docs/cli/commands/refresh.html
| throwaway894345 wrote:
| As the sibling commenter notes, Terraform has a refresh flag,
| but I wonder if Kubernetes' model is better here. Rather than a
| one-off process that tries to update everything, Kubernetes has
| many small controllers which are essentially processes on the
| cluster that just run a control loop. Each controller
| corresponds to one resource type, so it will just loop over
| incoming events that pertain to the resource type in question
| and attempt to reconcile the state of the target resource
| instance with the desired state. If it fails initially, it will
| retry with back off. If something doesn't stabilize after
| several minutes, an alert can notify a human.
|
| The key differences between the controller approach and the IaC
| approach are, I think, lots of little processes continuously
| reconciling state for all resources of a given type (many small
| loops that touch all resources of a given type on the entire
| cluster) versus a one-off process that tries to touch just the
| resources it cares about and if it fails it just gives up.
|
| One thing Kubernetes _definitely_ improves upon Terraform is
| that Kubernetes uses a YAML "assembly language" for its infra
| as code, but that YAML could be generated by a real programming
| language. Terraform expects you to write HCL, which is an
| accidentally re-invented programming language (every IaC tool
| provider thought static configs like YAML would suffice as a
| human interface, but as they gradually realized the need for
| more dynamism, Terraform and others would bolt on one dynamic
| feature after another until they had a slow, unfamiliar, and
| counterintuitive programming language). Terraform has a CDK
| that allows writing in other languages, but I'm skeptical that
| it liberated you from Terraform's model of the world (e.g., if
| I rename a variable in CDK, does it try to destroy and recreate
| the underlying resource as with Terraform?). I'm also concerned
| that rather than allowing us to generate YAML in the obvious
| way, it will require bizarre inheritance patterns like the AWS
| CDK. I would be curious to hear from folks who have used the
| CDK.
| the_duke wrote:
| Terraform is also working on allowing actual code:
| https://github.com/hashicorp/terraform-cdk
| throwaway894345 wrote:
| I know, I mention that in my comment :)
| ClumsyPilot wrote:
| "every IaC tool provider thought static configs like YAML
| would suffice as a human interface, but as they gradually
| realized the need for more dynamism, Terraform and others
| would bolt on one dynamic feature after another until they
| had a slow, unfamiliar, and counterintuitive programming
| language"
|
| I constantly see soo many people step on the same rake, it's
| incredible. Tools like Tilt let you use python, it's a much
| more sensible approach.
| leg100 wrote:
| The Hashicorp co-founders considered alternative approaches
| when originally designing terraform. The actor model was
| considered but dismissed. That's not a million miles away
| from the kubernetes reconcile loop model:
|
| > Then we transitioned to an actor-based model where each
| resource was almost an actor, and there was a message-passing
| interface between them.
|
| > This allowed the system to be highly concurrent the way
| Terraform is today, but also confusing for users to deal with
| and very difficult to build a programming model around,
| because the ordering of execution was so random and
| everything was happening concurrently.
|
| https://www.hashicorp.com/resources/terraform-fireside-
| chat-...
|
| They may still be right. Kubernetes' approach may seem more
| attractive but terraform is far more pragmatic in its design.
| verdverm wrote:
| Terraform accepts JSON format as an alternative to HCL.
|
| I prefer CUE to JSON for TF and many other tools now
| throwaway894345 wrote:
| To be clear, the issue isn't the HCL syntax. You could
| similarly use Cue to generate HCL. The problem is using
| Terraform's dynamic features which were poorly designed.
| steveb wrote:
| There are a lot of developments around using Kubernetes as an
| IaC platform for the reasons in your comment. The combination
| of a standard API model in CRDs + the controller model maps
| nicely to managing infrastructure and exposing resources to
| developers.
|
| <https://crossplane.io> just graduated to CNCF Incubation and
| each of the cloud providers are working on K8s controllers
| and code generators (like Amazon Controllers for Kubernetes,
| Google Config Connector, and the Azure service operator).
| wernerb wrote:
| The reasoning you put in also means kubernetes is unsuited to
| be controlled by terraform. Too many lifecycles (resources)
| to centrally control. kubernetes custom resources can have
| dependencies on others which terraform either needs to
| support as well. Which is not doable to maintain.. keep your
| kubernetes manifests outside or your terraform state.
| throwaway894345 wrote:
| My company manages a lot of Kubernetes manifests with
| Terraform without issue. Terraform is just generating the
| manifests in this case; Kubernetes is doing the
| reconciliation work. More complex than is ideal (i.e., if
| we were starting out with Kubernetes we probably wouldn't
| use Terraform) but it works reasonably well.
| orf wrote:
| There's also a refresh flag on plan. It's always worth using
| before your apply CI step.
| snom380 wrote:
| > On terraform it's different, because of the tfstate. All the
| deployed elements are stored in the tfstate, re-running terraform
| won't update resources that are supposed to be in a specific
| state but are not.
|
| This is incorrect, and makes me wonder how the author has used
| terraform. Terraform will certainly detect differences between
| managed and current state for the resources it manages whenever
| you do a plan/apply.
|
| The major challenge is that terraform _can only reconcile
| resources or configuration values it knows about_, and that
| depends very much on how a particular cloud vendor or terraform
| provider has modelled resources. I believe the Helm provider is
| one example where it (at least in the past) haven't had a good
| way to reconcile state.
| forty wrote:
| The stage when terraform read the reality to compare it to the
| current state is called "refresh". It can optionally be
| skipped.
| shadycuz wrote:
| I also agree, but I don't have experience with the providers he
| is using.
|
| When working with AWS it will always reapply the terraform
| configuration.
|
| A great use of this is for account hardening. You can run it
| daily to make sure it's configured correctly.
| OJFord wrote:
| It would be even better if you could somehow tell it it
| should have control over the entire account, so that anything
| (entire resources I mean, not just changed properties)
| created outside terraform would be destroyed.
|
| In terms of API use though I suppose that'd be quite
| expensive to plan - listing every possible AWS resource (in
| every region!) for example.
| joombaga wrote:
| You'd have to go per-region first to avoid a major redesign
| on the provider (the API is also per-region). I can see
| something like a `controlled_resource_types` list attribute
| on the provider that you could set to e.g. `[aws_instance]`
| to inform the provider it needs to compare the list of
| resources of the specified type to the state.
| sciurus wrote:
| Relatedly, I'm pretty sure the earlier statement
|
| > For some resources like RDS or EKS, it won't check if the
| resource already exists or not. So if it's missing, nothing is
| going to happen as it's marked are deployed in the tfstate file
|
| is also wrong.
| joombaga wrote:
| It is, at least for most RDS and EKS resources (RDS cluster,
| cluster instance, parameter group, EKS cluster)
| [deleted]
| raffraffraff wrote:
| There's also brokenness around terraform for_each and
| providers. If you have a module that creates a Kubernetes
| cluster and then applies a helm chart to it, you can't convert
| it to one that takes a bunch of cluster definitions that it
| iterates over using for_each. Basically, there is no way to do
| this in a date driven way. Sucks.
| robertlagrant wrote:
| The author also seems to think DSL means "descriptive
| languages" and that Helm "even" supports kubernetes, when in
| fact it's only for that technology.
| dead10ck wrote:
| This stuck out to me too. Terraform absolutely does check the
| current reality of the state and applies changes to do what the
| HCL tells is to. They are either using terraform in a really
| weird way, or this article was written by someone that doesn't
| actually run terraform themselves.
| jrochkind1 wrote:
| Phew! I thought I was misunderstanding terraform when I saw
| that.
|
| Perhaps they are working with certain poorly implemented or
| buggy providers, and not realizing those providers were doing
| something different than terraform's properly working
| behavior that most providers implemented.
|
| Bugs happen, but the first step is agreeing on intended
| behavior so we know what's a bug!
| marcinzm wrote:
| We use Terraform for cloud infrastructure and, basically, helm
| deploys of external apps. In other words things that don't change
| too often and the update of which has to be managed carefully
| anyway. The internal apps which get deployed at a much faster
| cadence use helm directly.
| danielovichdk wrote:
| I run only on Azure. Nearly all my IAC is written in PowerShell
| uilizing the Azure Cli.
|
| Terraform and yaml as well are so verbose and you have no clue
| whats going on from the local side of things.
|
| How do debug your terraform markup locally. My guess is you
| can't.
| awslattery wrote:
| The console command [1], mixed with local variables, is quite
| handy for debugging locally.
|
| 1. https://www.terraform.io/docs/cli/commands/console.html
| x86_64Ubuntu wrote:
| Seems as if people have very strong feelings about Terraform,
| but little actual experience using it.
| toyg wrote:
| It's not the easiest tool to grok, writing configuration
| files can be very verbose, and its opinions about a number
| of things can be very off-putting. I tried to "get into" TF
| many times and never actually fell in love. I like the
| idea, I like the cross-cloud approach, I just don't
| particularly like what the experience ends up being. This
| shit should be easier.
| mvanaltvorst wrote:
| I wish Terraform were less opinionated. It has a very clear set
| of rules you have to adhere to, and if you try to do anything
| remotely complex you will encounter barriers left and right.
|
| An example is the fact that `for_each` is not supported on
| providers [1], an issue with 230 likes which has not been solved
| since January 2019. This had me resort to a Python script which
| generates a `.tf.json` file, definitely not ideal. Infrastructure
| as code sounds great, but in practice it's closer to
| "infrastructure as a non-standard markup language".
|
| [1]: https://github.com/hashicorp/terraform/issues/19932
| throwaway894345 wrote:
| You have to understand that when IaC was new, the marketing was
| "it's so simple you can just write YAML/JSON/etc" because
| frankly the industry was too dumb to understand that "using a
| real programming language to _generate a description of the
| desired resource state_ " and "using a real programming
| language to imperatively reconcile the current and desired
| states oneself" are different things. So Terraform began with
| something that resembled YAML in its static-ness, and over
| time, more power was required so they would bolt on a dynamic
| feature but were reluctant to give the impression that they
| were building a programming language so the feature would be as
| obscure as possible. But that wouldn't be enough either so they
| would add still more dynamic features, each comparably obscure
| until in time they'd built a complete, obscure programming
| language.
|
| But this wasn't just Terraform! The entire industry did this
| too. CloudFormation began as simple JSON, but over time they
| allowed you to encode the abstract syntax tree of a shitty
| programming language in your YAML, and CloudFormation would
| interpret it. However stupid that may sound, in the Kubernetes
| world, we have Helm which lets you generate YAML with _text
| templates_ which is honestly the dumbest idea in the world
| (imagine a compiler that generates syntactically invalid
| machine code if the input program has an extra white space
| character).
|
| Of course in all of these cases the answer is staring us in the
| face: use a static language (YAML, JSON, etc) to _describe_ the
| desired state, and use a higher level language (like Python or
| Starlark or Dhall or etc) to _generate_ that static desired
| state description. The only thing Terraform (or any IaC tool)
| should care about is the YAML description. That it is generated
| from Starlark or TypeScript is just an implementation detail.
|
| Instead of that, though, we get CDKs which are _so close_ , but
| admittedly I haven't used them in anger yet.
| x3n0ph3n3 wrote:
| One of the best parts of CloudFormation was their
| introduction of Macros. You can take either your whole
| template or just a snippet, and perform dynamic
| transformations by calling a lambda. I'll admit I've gone so
| far as being able to embed ERB (Ruby) into my templates in
| order to more dynamically define some resources based on
| stack parameters. I can also create N resources with common
| configuration based on the values of a CommaDelimitedList.
| throwaway894345 wrote:
| I think the idea here is that macros are neat in any
| language, but in CloudFormation they can help automate
| stuff that is only difficult because of CloudFormation, and
| the macros themselves are harder to use than those in a
| normal programming language. In all cases, I think it's
| strictly less nice than generating your CloudFormation YAML
| with Python or similar.
| kevincox wrote:
| I think this is less that it is opinionated but more that the
| HCL evaluation feels like a pile of hacks. There are unclear
| rules on what can be evaluated when and what dependencies are
| possible. Part of this is so that `plan` can work as it does,
| but it seems like there are just major gaps in general. For
| example providers can't depend on resources. This makes it very
| difficult to for example set up EKS then use the kubernetes
| provider to manage the resources in the cluster. The solution
| is obviously separate stacks but that brings in a whole bunch
| of other problems.
|
| I think Terraform is quite possibly the best tool available,
| but there are clear flaws with both the model and the
| implementation. I think if I were to make a Terraform v2 I
| would make `plan` completely pure. This would avoid the
| provider issues, make validation and testing in CI easier and a
| whole bunch of other benefits. Of course there are downsides.
| For example EC2 instance IDs are random so you can't just
| include them in your pure plan. You would need some type of
| placeholder that is used for evaluation. This does cause some
| issues as it limits the operations that you can do with that
| value (so you can't pick the instance size based on the random
| instance ID) but overall I don't think it would be a major
| issue if the final substitution was handled well by the
| framework.
| wayoutthere wrote:
| The new use case for Terraform is IT departments using it as an
| IaC policy control system for self-service. Rather than push
| teams through a web interface, you just expose Terraform via
| Terragrunt to the dev teams and run their files through a policy-
| driven linter before executing. And make it so that Terraform is
| the _only_ way you can push to prod.
|
| I think people get into trouble with Terraform when they try to
| use it to do more than provision infrastructure. Things that
| should probably be part of build scripts, CI/CD pipelines or
| config management. Terraform isn't good at those things; but it
| is very good at provisioning cloud infra in a cloud-agnostic
| fashion.
| pvtmert wrote:
| I agree most of your comment but the cloud-agnostic part.
|
| Heck, I never get how terraform can be cloud-agnostic... If
| everybody thinks having a same language (HCL) is equalivent to
| cloud agnostic, YAML exists...
|
| It is literrally impossible to create a simple VM in 2
| different cloud providers without defining them twice with
| their own specific parameters.
|
| If you use AWS provider, resources start with aws_, if it's GCP
| it starts with gcp_ and so on. It is not possible to have a
| "resource vm { name = ... provider = aws }"
| jen20 wrote:
| Terraform provides the same _workflow_ across clouds, not the
| same resource model (which would be dumb, since it would
| necessarily provide only a lowest common denominator
| representation).
|
| > It is not possible to have a "resource vm { name = ...
| provider = aws }"
|
| It actually is via modules. It's a lot of work with basically
| no benefit though, so in practice people don't do this.
|
| Pulumi is better at specifically this kind of thing however,
| since you can implement a common interface which can be
| specialised for each available cloud.
| airocker wrote:
| We use this exact stack, but we generally would not rely on
| tfstate. We will remove everything and regenerate it. It's not an
| operation done too frequently that we have a big problem. Also,
| helm we use as a separate layer that is applied after terraform
| and can be repeated many times. This changes often.
| jdub wrote:
| My biggest issue with Terraform is the impedance mismatch between
| HCL and the rest of the known universe.
|
| When you write a provider, you spend half your time converting
| data structures from the HCL submitted to your provider into the
| JSON your target service inevitably expects, then you spend half
| your time converting the JSON your target service inevitably
| returns into HCL for Terraform to consume, and then you spend
| another half of your time fixing bugs and polishing.
|
| It's okay when you're building simple providers, but anything
| reasonably complex becomes unwieldy. I had a go at building some
| providers for AWS services that were not supported by Terraform
| or CloudFormation... and I just retreated to cheesy Lambda custom
| resources for CloudFormation.
| gtirloni wrote:
| The upside though is that you're adapting that known universe
| to a way of working that makes sense for terraform _users_.
| ClumsyPilot wrote:
| Good tools are fit for the world we live in, bad tools
| require you to "adapt the the known universe" to the tool
| jdub wrote:
| But I wanted to be a Terraform user. So an alternative
| interpretation is that, as designed, Terraform is slowing
| down adaptation of the known universe for Terraform users.
| throwaway894345 wrote:
| Unfortunately HCL is also an unnecessary learning curve for
| users. I still don't have my head around a list versus a
| bunch of blocks with the same name, for example. I originally
| thought it was syntax sugar for a single mechanism, but I've
| had errors for trying to use one instead of the other before.
| kubanczyk wrote:
| I read you. The half of the Terraform's value today is "let's
| have common aesthetics - especially the examples and
| snake_case_naming_convention - and convert _any_ REST API in
| the world to that ".
|
| The typical example is to start from there:
|
| https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_V...
|
| and arrive here:
|
| https://registry.terraform.io/providers/hashicorp/aws/latest...
|
| If you review it carefully, it is apparent how much coding
| effort and many moving parts were used to perform a
| transformation which seems disproportionately primitive.
| reilly3000 wrote:
| I'm partial to Pulumi for this reason. It allows devs to use
| familiar languages to define infrastructure with their familiar
| tools, write tests, and even interop with existing terraform.
| dharmab wrote:
| An alternative opinion: When I worked at a large tech company, my
| team made a conscious decision to not use terraform. This gave us
| some key advantages- we are able to adopt new cloud features
| immediately, months before they were available in tf, and our
| direct cloud access let us build features that would surprise the
| teams using tf within the company.
|
| If your core competency isn't dependent on your cloud platform tf
| is a great tool. But using cloud APIs directly was great for us.
| jrsdav wrote:
| > But using cloud APIs directly was great for us
|
| This is fine, I've done it extensively myself for some of the
| bleeding-edge cloud stuff, but the importance of things like
| tracking state, managing hierarchical resource dependencies, or
| retry/back-off logic shouldn't be tossed aside simply because
| there are gaps in what's available in the Terraform providers.
| Especially where change management is important (basically any
| enterprise company).
|
| I'd caution others reading this against abandoning something
| altogether and writing bespoke IaC tooling simply because the
| stable approach doesn't cover every (bleeding) edge case.
|
| You'll spend a lot of time reinventing the wheel, and while
| it's fine for certain situations (like when you only care about
| desired state, not known state, for instance), you'll move
| faster (and likely safer) by sticking with tools like Terraform
| for the bulk of your infra, and augmenting here there with
| cloud APIs/SDKs when needed.
| dharmab wrote:
| Yes, we did have to implement our own state tracking,
| retries/recovery, etc- but since we were focused on a limited
| subset of the cloud API, this was pretty easy.
| Thaxll wrote:
| So you re-implemented terraform but worse most likely. Also
| you could have added those missing features and re-use the
| TF engine, it's very simple to include new API of an
| existing provider.
| goodpoint wrote:
| Same here. It's incredible how much efforts developers are
| willing to put on popular "devops" tools when the job could be
| done faster with 200 lines of Python.
| iddqd wrote:
| It usually takes 1-2 years for AWS to roll out their latest
| updates to the regions I use and by then Terraform is stable.
| oneplane wrote:
| Would the time spent re-implementing a specialised Terraform
| subset be better spent simply maintaining a private branch of
| the AWS Provider? You can add your secret/special API without
| having to do all the other heavy lifting as well.
|
| This makes your own effort for customisation minimal, keeps
| your knowledge portable and because your added features can be
| separated in to different files and the provider API is stable
| you can also easily backport/fast-forward new changes.
| CSDude wrote:
| What new cloud features was not available in Terraform for
| months?
| dharmab wrote:
| Private preview features for partnered organizations. We had
| access up to 6 months before the public.
| pvtmert wrote:
| eg. AWS ChatBot not available in TF yet. TBH AWS haven't even
| added it to their Go SDK. So, I cannot blame TF. But anyway
| that's one of the inheret problems of TF plugin system.
|
| Compare to kubectl. Where you can write plugins in bash/shell
| and mark with execute bit, put it in somewhere in your $PATH
| as kubectl-blabla and use it as "kubectl blabla".
| CSDude wrote:
| It's not fair to compare imperative simple shell scripts
| with the things Terraform does. It has schema validation,
| state comparison, retries, failure handlers etc.
|
| Also, just as you can write extensions to kubectl, you can
| write your own provider in Terraform if it does not exists.
| See https://registry.terraform.io/modules/waveaccounting/ch
| atbot...
|
| Also, Chatbot does not have a public API, that's why, it's
| only configured via Cloudformation. So the expectation is
| not fair either.
|
| I've seen Cloudformation getting features years later. i.e
|
| 2021 - https://aws.amazon.com/about-aws/whats-
| new/2021/05/amazon-dy... 2015 -
| https://aws.amazon.com/about-aws/whats-new/2015/07/amazon-
| dy...
| jen20 wrote:
| NAT Gateways is another notable feature that took
| CloudFormation months yet Terraform had on day 1.
|
| If you can configure something via CloudFormation you can
| integrate it via Terraform et al also, since they have
| resources representing CloudFormation stacks.
| lincler wrote:
| This! Is not like you can't go beyond what Terraform
| offers by default. Running CloudFormation stacks from
| Terraform is a neat way of solving missing
| apis/integration. And that's exactly what my team did
| when Terraform was missing a lot of lambdas
| functionalities. We just declared the CloudFormation
| Stack for lambdas and then call it from Terraform.
| jen20 wrote:
| There's no reason you can't do something similar with
| Terraform either - plugins speak GRPC and thus could be
| implemented in Python, with Node.js or with Rust.
|
| However, if AWS have not published metadata for a given
| service to be used across their various SDKs, it's hard to
| take that service particularly seriously, so I'm not sure
| I'd bother with this.
| bovermyer wrote:
| From the problems the author is having, it would appear that
| perhaps Pulumi would be the better choice in this case.
| rgoulter wrote:
| In the article, the problems the author discusses are:
|
| 1. Inconsistent behaviour between providers. e.g. if a resource
| has been destroyed since the last `terraform apply`, then some
| resources/providers would recreate the resource, others
| wouldn't. (Similarly, there's not a guarantee that the state
| after running `terraform apply` matches up with what's there,
| if the provider is happy with its state file).
|
| 2. The dependencies of already-applied resources can block
| `terraform apply` if the upstream API for these resources
| suffers an outage.
|
| 3. If a `terraform apply` applies some resources before
| failing, this can result in an inconsistent state. Either the
| resources need to be deleted, or imported.
|
| I'm not familiar with Pulumi; what aspects of these would
| Pulumi help with?
| kall wrote:
| 3 happens regularly and I don't see how the other two would
| really be different, since some pulumi providers are using
| terraform providers under the hood.
| p2t2p wrote:
| My experience is that terraform sucks. A lot. Yet everything else
| seem to suck even more
| jrochkind1 wrote:
| > On terraform it's different, because of the tfstate. All the
| deployed elements are stored in the tfstate, re-running terraform
| won't update resources that are supposed to be in a specific
| state but are not.
|
| Huh, is that true?
|
| I'm just getting started with terraform but I assumed that was
| the idea of terraform (where it didn't happen would be a bug),
| and I think I had seen it happening for the few basic resources I
| have started out with (S3, cloudfront).
|
| If the state doesn't match the actual configuration of S3,
| terraform notices, and the plan is to make it so. No? Am I
| confused and it hasn't been doing this?
|
| Or is this is inconsistent, true of some resources and not of
| others? That seems surprising. What's the idea?
| snom380 wrote:
| It's not true, and what you describe is correct.
|
| I can only assume that the author has used some provider that
| hasn't implemented this properly (helm, I believe, is one
| example), or that they've run into one of the cases where
| terraform treats configuration/attachments to a resource as a
| separate resource (e.g. IAM role vs attached/inline IAM
| policies).
| shatteredspace wrote:
| If Terraform was used for the deployment of the infrastructure,
| then state IS the actual configuration of the system.
|
| All that a plan does is evaluate what is going to change in the
| current Terraform state by performing a dry-run of the
| Terraform code that you have supplied.
|
| If you would actually like to make changes to the Terraform
| state based on what the Terraform code evaluated then you run a
| Terraform apply - which will, for the resources deployed via
| Terraform, update the configurations themselves and update the
| Terraform state by using the Terraform code as the instruction
| set.
|
| You can actually see this in action with plan and apply as the
| output will show you +,-, and ~ where ~ is settings that are
| going to change but are not new configurations or configuration
| to be removed.
|
| Edit: Learned from some other comments that Terraform has a
| 'refresh' command that will take deploy+n time configurations
| done outside of Terraform and sync those configurations with
| the state. This might be what you ideally are looking for after
| deployments?
| jrochkind1 wrote:
| Right. I guess I'm asking about what happens if state changed
| outside of terraform.
|
| I thought I had seen terraform correcting it (to match what
| terraform thinks it should be) in some cases.
|
| OP seems to suggest that in some cases it does and in other
| cases it doens't. I am surprised if that is inconsistent and
| unpredictable, and would have expected terraform to (modulo
| bugs) either always or never do that. And am wondering what
| terraform's intent is with that.
| shatteredspace wrote:
| Made an edit to my original comment but it may help to be
| here also.
|
| 'terraform refresh' may be what you are looking for. This
| will update the state to match current configurations that
| may have been done outside of Terraform.
| snom380 wrote:
| From
| https://www.terraform.io/docs/cli/commands/refresh.html :
|
| > You shouldn't typically need to use this command,
| because Terraform automatically performs the same
| refreshing actions as a part of creating a plan in both
| the terraform plan and terraform apply commands.
| jrochkind1 wrote:
| I'm talking about fixing the external actual
| configuration that has diverged, to match what terraform
| config wants it to be.
|
| However, snom380 says this is what terraform is intended
| to do, and does with properly implemented providers,
| which makes sense to me.
|
| I'm not sure if you are talking about the same things. We
| -- me, snom380, and the original quote I made from OP --
| are talking about what happens when external actually
| existing live resources have diverged from terraform's
| state. I understand what you said that under a "perfect"
| situation, this would not happen. But it does sometimes
| for various reasons, what the original quote from OP is
| talking about and what I'm talking about is what happens
| when it does. I think maybe you are talking about
| something else.
___________________________________________________________________
(page generated 2021-09-19 23:02 UTC)