https://blog.yossarian.net/2023/09/22/GitHub-Actions-could-be-so-much-better ENOSUCHBLOG Programming, philosophy, pedaling. * Home * Tags * Series * Favorites * Archive * Main Site --------------------------------------------------------------------- GitHub Actions could be so much better Sep 22, 2023 Tags: programming, rant, workflow --------------------------------------------------------------------- I love GitHub Actions: I've been a daily user of it since 2019 for both professional and hobbyist projects, and have found it invaluable to both my overall productivity and peace of mind. I'm just old enough to have used Travis CI et al. professionally before moving to GitHub Actions, and I do not look back with joy^1. By and large, GitHub Actions continues to delight me and grow new features that I appreciate: reusable workflows, OpenID connect, job summaries, integrations into GitHub Mobile, and so forth. At the same time, GitHub Actions is a regular source of profound frustration and time loss^2 in my development processes. This post lists some of those frustrations, and how I think GitHub could selfishly^3 improve on them (or even fix them outright)^4. --------------------------------------------------------------------- Debugging like I'm 15 again Here's a pretty typical session of me trying to set up a release workflow on GitHub Actions: [github-act] In this particular case, it took me 4 separate commits (and 4 failed releases) to debug the various small errors I made: not using ${{ ... }}^5 where I needed to, forgetting a needs: relationship, &c. Here's another (this time of a PR-creating workflow), from a few weeks later: [github-act] I am not the world's most incredible programmer; like many (most?), I program intuitively and follow the error messages until they stop happening. GitHub Actions is not responsible for catching every possible error I could make, and ensuring that every workflow I write will run successfully on the first try. At the same time, the current debugging cycle in GitHub Actions is ridiculous: even the smallest change on the most trivial workflow is a 30+ second process of tabbing out of my development environment (context switch #1), digging through my browser for the right tab (context switch #2), clicking through the infernal nest of actions summaries, statuses, &c. (context switch #3), and impatiently refreshing a buffered console log to figure out which error I need to fix next (context switch #4). Rinse and repeat. Fixing this * Give us an interactive debugging shell, or (at least) let us re-run workflows with small changes without having to go through a git add; git commit; git push cycle^6. * Give us a repository setting to reject commits with obviously invalid workflows (things like syntax that can't possibly work, or references to jobs/steps that don't exist). It's infuriating when I git push a workflow that silently fails because of invalid YAML; especially when I then merge that workflow's branch under the mistaken impression that the workflow is passing, rather than not running at all. Security woes Speaking from experience: it's shockingly easy to wreck yourself with GitHub Actions. Way easier than it should be. Here is just a small handful of the ways in which I have personally written potentially vulnerable workflows over the past few years: 1. Using the ${{ ... }} expansion syntax in a shell or other context where a (potentially malicious) user controls the expansion's contents. The following, for example, would allow a user to inject code that could then exfiltrate $MY_IMPORTANT_SECRET: 1 - name: do something serious 2 run: | 3 something-serious "${{ inputs.frob }}" 4 env: 5 MY_IMPORTANT_SECRET: ${{ secrets.MY_IMPORTANT_SECRET }} Some among you will observe that a good programmer would simply know not to do this, and that a bad programmer would eventually learn their (painful) lesson. This might be an acceptable position for a niche piece of software to hold; it is not an acceptable position for the CI/CD platform that, to a first approximation, hosts the entire open source ecosystem. 2. Using pull_request_target. As far as I can tell, it's practically impossible to use this event safely in a non-trivial workflow^7. This event appears to exist for an extremely narrow intended use case, i.e. labeling or commenting on PRs that come from forks. I don't understand why GitHub Actions chooses to expose such a (relatively) simple operation through as massive of a foot-gun as pull_request_target. 3. Over-scoping my workflow and job-level permissions. The default access set for Actions' ordinary GITHUB_TOKEN is very permissive: the only thing it doesn't provide access to are the workflow's OpenID Connect token. This consistently bites me in two different ways: 1. I consistently forget to down-scope the default token, especially when working with repositories under my personal account (rather than under an org, where the default scope can be reduced across all repositories). 2. I consistently over-scope my tokens because I don't know exactly how much access my workflow will need. This is further complicated by the messy ways in which GitHub's permission model gets shoehorned into a single permissions dimension of read/write/none: why does id-token: write grant me the ability to read the workflow's OpenID Connect token? Why do some GET operations on security advisories require write, while others only require read? There are also a few things that I haven't done^8, but are scary enough that I think they're worth mentioning. For example, can you see what's wrong with this workflow step? 1 steps: 2 - uses: actions/checkout@c7d749a2d57b4b375d1ebcd17cfbfb60c676f18e Despite all appearances, SHA ref c7d749a2d57b4b375d1ebcd17cfbfb60c676f18e is not a commit on the actions/checkout repository! It's actually a commit on a fork in actions/checkout's network which, thanks to GitHub's use of alternates, appears to belong to the parent repository. Chainguard has an excellent post on this^9, but to summarize: 1. SHA references from forks are visually indistinguishable from SHA references in the intended target repository. The only way to tell the two apart is to manually inspect each reference and confirm that it appears on the expected repository, and not one of its forks. 2. GitHub's own REST API makes no distinction between SHA references in a repository graph -- /repos/{user}/{repo}/commits/{ref} returns a JSON response that only references {user}/{repo}, even if {ref} is only on a fork. 3. Because GitHub fails to distinguish between fork and non-fork SHA references, forks can bypass security settings on GitHub Actions that would otherwise restrict actions to only "trusted" sources (such as GitHub themselves or the repository's own organization). GitHub's response to this (so far) has been to add a little bit of additional language to their documentation, rather than to forbid misleading SHA references outright. Fixing this * Give us push-time rejection of obviously insecure workflows. In other words: let us toggle^10 a "paranoid workflow security" mode that, when enabled, causes git push to fail with an explanation of what I'm doing wrong. Essentially the same thing as the debugging request above, but for security! * Give us runtime checks on our workflows, analogous to runtime instrumentation like AddressSanitizer in the world of compiled languages. There are so many things that could be turned into hard failures for security wins without breaking 99.9% of legitimate users, like failing any attempt to use actions/ checkout on a pull_request_target with a ref that isn't from the targeted repository. * Maybe just deprecate and remove pull_request_target entirely. GitHub's own Security Lab has been aware of how dangerous this event is for years; maybe it's time to get rid of it entirely. * Allow us to set a more restrictive default token scope on our personal repositories, similar to how organizations and enterprises can restrict their default GITHUB_TOKEN scopes across all repositories at once. * By default, reject any SHA-pinned action for which the SHA only appears on a fork and not the referenced repository. It's hard to imagine a legitimate reason to ever need to do this! Real types would be nice When writing a custom GitHub Action, you can specify the actions inputs using a mapping under the inputs: key. For example, the following defines a frobulation-level input with a description (used for tooltips in many IDEs) and a default value: 1 inputs: 2 frobulation-level: 3 description: "the level to frobulate at" 4 default: "1" Notably, this syntax does not allow for type enforcement; the following does not work: 1 inputs: 2 frobulation-level: 3 description: "the level to frobulate to" 4 default: 1 5 # NOTE: this SHOULD cause a workflow failure if the input 6 # isn't a valid number, but doesn't 7 type: number This absence is strange, but what makes it bizarre is that GitHub is inconsistent about where types can appear in actions and workflows: * workflow_call supports type with boolean, number, or string * workflow_dispatch supports type with boolean, choice, number, or string * Action inputs: no types at all Unfortunately, this is only the first level: even inputs that do support typing doesn't support compounded data structures, like lists or objects. For example, neither of the following works: 1 - uses: example/example 2 with: 3 # INVALID: can't use arrays as inputs 4 paths: [foo, bar, baz] 5 # INVALID: can't use objects as inputs 6 headers: 7 foo: bar 8 baz: quux ...which means that action writers end up requiring users to do silly things like these: 1 - uses: example/example 2 with: 3 # SILLY: action does ad-hoc CSV-ish parsing 4 paths: foo,bar,baz 5 # SILLY: action forcefully flattens a natural hierarchy 6 header-foo: bar 7 header-baz: quux This is bad for maintainability, and bad for security: maintainability because actions must carefully manage a single flat namespace of inputs (with no types!), and security because both action writer and workflow writer are forced into ad-hoc, unspecified languages for complex inputs. Fixing this * Let action and workflow writers use type: everywhere, and let us use choice everywhere -- not just in workflow_dispatch! * Give us stricter type-checking. Where action and workflow types can be inferred statically, detect errors and reject incorrectly typed workflow changes at push time, rather than waiting for the workflow to inevitably fail. * Give us type: object and type: array types. These won't be perfect to start with (thanks to potentially heterogeneous interior types), but they'll be a significant improvement over the status quo. Implementation-wise, forward these as JSON-serialized strings or something similar^11 where appropriate (such as in auto-created INPUT_{WHATEVER} environment variables). (More) official actions would be nice The third-party ecosystem on GitHub Actions is great: there are a lot of high-quality, easy-to-use actions being maintained by open source contributors. I maintain a handful of them! Beneath the surface of these excellent third-party actions is a substrate of official, GitHub-maintained actions. These actions primarily address three classes of fundamental CI/CD activities: 1. Core git operations: actions/checkout 2. Core GitHub operations and repository housekeeping: actions/ {upload,download}-artifact, actions/cache, actions/stale 3. General (but essential) configuration: actions/setup-python, actions/setup-node These classes are somewhat distinct from "higher-level" workflows (like the kind I write): because of their centrality and universal demand, they benefit from singular, high-quality, officially maintained implementations. And so, the question: why are there so few of them? Here is just a smattering of the official actions that don't exist: 1. Programmatically adding a pull request to a merge queue. GitHub has the machinery to support this: gh pr merge already exists. It just isn't exposed as an action; users are (presumably) expected to piece it together themselves. Even worse, there are actions that did exist but were deprecated (generally for unclear reasons^12): 1. actions/create-release: unmaintained as of March 2021. Users encouraged to switch to various community maintained workflows, most notably^13 softprops/action-gh-release. 2. actions/upload-release-asset: marked as unmaintained at the same time as actions/create-release. 3. actions/setup-ruby: unmaintained as of February 2021. Users encouraged to switch to ruby/setup-ruby. I'm sympathetic to the individual maintainers here and, in each case, the transition to a "recommended" third-party action was relatively painless. Still, the overall impression given here is unmistakable: that GitHub does not see official actions for its own platform features (or key ecosystem users, like Ruby) as priorities, and would rather have the community develop and choose unofficial favorites. This is not unreasonable on a strategic level (it induces third-party development in their ecosystem), but has a deleterious effect on trust in the platform. I'd like to be able to write workflows and know that they'll run (with minimal changes) 5 years from now, and not worry that GitHub has abandoned core pieces underneath me! Apart from imparting a general feeling of shabbiness, this compounds with GitHub Action's poor security story (per above): not providing official high-quality actions for their own API surfaces means that users will continue to make exploitable security mistakes in their workflows. Nobody wins^14. Fixing this * Give us more official actions. As a very rough rule of thumb: if a thing directly ties different pieces of GitHub infrastructure together and currently needs to be done manually (with REST API calls, gh invocations, or whatever else), it probably deserves a full official action! * Give us more pseudo-official actions. Work with the biggest third-party actions^15 to form a community-actions (or whatever) org, with the expectation that actions homed under that org have been reviewed (at some point) by GitHub, are forced to adhere to best practices for repository security, receive semantically versioned updates, &c &c. Wrap-up This is a long and meandering post, and many parts are in conflict: security and stability (in the form of more official actions that break less often), for example, are in eternal conflict with each other. I'm just one user, and I don't expect my interests or frustrations to be overriding ones. Still, I hope that the problems (and potential fixes) above aren't unique to me, and that there are engineers at GitHub who (again, selfishly!) share these concerns and would like to see them fixed. --------------------------------------------------------------------- 1. In a large part because, at GitHub's size, I worry much less about private equity enshittifying it. - 2. Just enough for it to really hurt, against the backdrop of GitHub Actions' overall productivity benefits. - 3. In the sense that these things would be in GitHub's own self-interest, making GHA even more appealing to developers, further cement its dominance in the CI/CD space, &c. They should do these things for their own sake! - 4. After finishing this post, I discovered that GitHub has a public roadmap for Actions features. Maybe some of my grievances are already known and listed here; it's a big roadmap! - 5. Completely unrelated to this post: writing ${{ ... }} is remarkably painful in a Liquid-rendered Jekyll blog. - 6. Yes, I know this fundamentally breaks the GitHub Actions data model; I didn't say it would be easy! - 7. In the sense that "using pull_request_target safely" means being confident that you never accidentally run anything from the pull request that just triggered your workflow. - 8. And I think haven't been done to me. - 9. Which I stole the actions/checkout example from, since I was too lazy to make my own. - 10. Even better, make it the default, and require people to click through a "destructive action" modal similar to the ones for other dangerous user or repository setting changes. - 11. JSON is a semi-obvious choice here, since GitHub Actions already has a fromJSON(...) function and maps cleanly from YAML. - 12. The primary stated reason is time, leading to the revelation that these critical actions were side projects. That isn't these engineers' fault; they seem to have been making the best out of a bad situation! But it's incredible to see GitHub, organizationally, squander so much value and community goodwill here. - 13. In my opinion. It seems to have the most users and most activity, although it's bonkers that I'm evaluating something as critical as this based on those kind of weak proxy signals. - 14. Except for the pentesting industrial complex. - 15. Off the top of my head: actions like ruby/setup-ruby, shivammathur/setup-php, and peaceiris/actions-gh-pages (among others) have hundreds of thousands of active users, and form a critical part of the Actions ecosystem. They should be treated as such! - --------------------------------------------------------------------- Previously