https://blog.yossarian.net/2023/09/22/GitHub-Actions-could-be-so-much-better

ENOSUCHBLOG

Programming, philosophy, pedaling.

  * Home
  * Tags
  * Series
  * Favorites
  * Archive
  * Main Site

---------------------------------------------------------------------

GitHub Actions could be so much better

Sep 22, 2023     Tags: programming, rant, workflow    

---------------------------------------------------------------------

I love GitHub Actions: I've been a daily user of it since 2019 for
both professional and hobbyist projects, and have found it invaluable
to both my overall productivity and peace of mind. I'm just old
enough to have used Travis CI et al. professionally before moving to
GitHub Actions, and I do not look back with joy^1.

By and large, GitHub Actions continues to delight me and grow new
features that I appreciate: reusable workflows, OpenID connect, job
summaries, integrations into GitHub Mobile, and so forth.

At the same time, GitHub Actions is a regular source of profound
frustration and time loss^2 in my development processes. This post
lists some of those frustrations, and how I think GitHub could
selfishly^3 improve on them (or even fix them outright)^4.

---------------------------------------------------------------------

Debugging like I'm 15 again

Here's a pretty typical session of me trying to set up a release
workflow on GitHub Actions:

[github-act]

In this particular case, it took me 4 separate commits (and 4 failed
releases) to debug the various small errors I made: not using ${{ ...
}}^5 where I needed to, forgetting a needs: relationship, &c.

Here's another (this time of a PR-creating workflow), from a few
weeks later:

[github-act]

I am not the world's most incredible programmer; like many (most?), I
program intuitively and follow the error messages until they stop
happening.

GitHub Actions is not responsible for catching every possible error I
could make, and ensuring that every workflow I write will run
successfully on the first try.

At the same time, the current debugging cycle in GitHub Actions is
ridiculous: even the smallest change on the most trivial workflow is
a 30+ second process of tabbing out of my development environment
(context switch #1), digging through my browser for the right tab
(context switch #2), clicking through the infernal nest of actions
summaries, statuses, &c. (context switch #3), and impatiently
refreshing a buffered console log to figure out which error I need to
fix next (context switch #4). Rinse and repeat.

Fixing this

  * Give us an interactive debugging shell, or (at least) let us
    re-run workflows with small changes without having to go through
    a git add; git commit; git push cycle^6.

  * Give us a repository setting to reject commits with obviously
    invalid workflows (things like syntax that can't possibly work,
    or references to jobs/steps that don't exist). It's infuriating
    when I git push a workflow that silently fails because of invalid
    YAML; especially when I then merge that workflow's branch under
    the mistaken impression that the workflow is passing, rather than
    not running at all.

Security woes

Speaking from experience: it's shockingly easy to wreck yourself with
GitHub Actions. Way easier than it should be.

Here is just a small handful of the ways in which I have personally
written potentially vulnerable workflows over the past few years:

 1. Using the ${{ ... }} expansion syntax in a shell or other context
    where a (potentially malicious) user controls the expansion's
    contents. The following, for example, would allow a user to
    inject code that could then exfiltrate $MY_IMPORTANT_SECRET:

    1 - name: do something serious
    2   run: |
    3    something-serious "${{ inputs.frob }}"
    4   env:
    5     MY_IMPORTANT_SECRET: ${{ secrets.MY_IMPORTANT_SECRET }}

    Some among you will observe that a good programmer would simply
    know not to do this, and that a bad programmer would eventually
    learn their (painful) lesson. This might be an acceptable
    position for a niche piece of software to hold; it is not an
    acceptable position for the CI/CD platform that, to a first
    approximation, hosts the entire open source ecosystem.

 2. Using pull_request_target. As far as I can tell, it's practically
    impossible to use this event safely in a non-trivial workflow^7.

    This event appears to exist for an extremely narrow intended use
    case, i.e. labeling or commenting on PRs that come from forks. I
    don't understand why GitHub Actions chooses to expose such a
    (relatively) simple operation through as massive of a foot-gun as
    pull_request_target.

 3. Over-scoping my workflow and job-level permissions.

    The default access set for Actions' ordinary GITHUB_TOKEN is very
    permissive: the only thing it doesn't provide access to are the
    workflow's OpenID Connect token.

    This consistently bites me in two different ways:

     1. I consistently forget to down-scope the default token,
        especially when working with repositories under my personal
        account (rather than under an org, where the default scope
        can be reduced across all repositories).
     2. I consistently over-scope my tokens because I don't know
        exactly how much access my workflow will need.

        This is further complicated by the messy ways in which
        GitHub's permission model gets shoehorned into a single
        permissions dimension of read/write/none: why does id-token:
        write grant me the ability to read the workflow's OpenID
        Connect token? Why do some GET operations on security
        advisories require write, while others only require read?

There are also a few things that I haven't done^8, but are scary
enough that I think they're worth mentioning.

For example, can you see what's wrong with this workflow step?

1 steps:
2   - uses: actions/checkout@c7d749a2d57b4b375d1ebcd17cfbfb60c676f18e

Despite all appearances, SHA ref
c7d749a2d57b4b375d1ebcd17cfbfb60c676f18e is not a commit on the
actions/checkout repository! It's actually a commit on a fork in
actions/checkout's network which, thanks to GitHub's use of
alternates, appears to belong to the parent repository.

Chainguard has an excellent post on this^9, but to summarize:

 1. SHA references from forks are visually indistinguishable from SHA
    references in the intended target repository. The only way to
    tell the two apart is to manually inspect each reference and
    confirm that it appears on the expected repository, and not one
    of its forks.
 2. GitHub's own REST API makes no distinction between SHA references
    in a repository graph -- /repos/{user}/{repo}/commits/{ref}
    returns a JSON response that only references {user}/{repo}, even
    if {ref} is only on a fork.
 3. Because GitHub fails to distinguish between fork and non-fork SHA
    references, forks can bypass security settings on GitHub Actions
    that would otherwise restrict actions to only "trusted" sources
    (such as GitHub themselves or the repository's own organization).

GitHub's response to this (so far) has been to add a little bit of
additional language to their documentation, rather than to forbid
misleading SHA references outright.

Fixing this

  * Give us push-time rejection of obviously insecure workflows. In
    other words: let us toggle^10 a "paranoid workflow security" mode
    that, when enabled, causes git push to fail with an explanation
    of what I'm doing wrong. Essentially the same thing as the
    debugging request above, but for security!

  * Give us runtime checks on our workflows, analogous to runtime
    instrumentation like AddressSanitizer in the world of compiled
    languages. There are so many things that could be turned into
    hard failures for security wins without breaking 99.9% of
    legitimate users, like failing any attempt to use actions/
    checkout on a pull_request_target with a ref that isn't from the
    targeted repository.

  * Maybe just deprecate and remove pull_request_target entirely.
    GitHub's own Security Lab has been aware of how dangerous this
    event is for years; maybe it's time to get rid of it entirely.

  * Allow us to set a more restrictive default token scope on our
    personal repositories, similar to how organizations and
    enterprises can restrict their default GITHUB_TOKEN scopes across
    all repositories at once.

  * By default, reject any SHA-pinned action for which the SHA only
    appears on a fork and not the referenced repository. It's hard to
    imagine a legitimate reason to ever need to do this!

Real types would be nice

When writing a custom GitHub Action, you can specify the actions
inputs using a mapping under the inputs: key. For example, the
following defines a frobulation-level input with a description (used
for tooltips in many IDEs) and a default value:

1 inputs:
2   frobulation-level:
3     description: "the level to frobulate at"
4     default: "1"

Notably, this syntax does not allow for type enforcement; the
following does not work:

1 inputs:
2   frobulation-level:
3     description: "the level to frobulate to"
4     default: 1
5     # NOTE: this SHOULD cause a workflow failure if the input
6     # isn't a valid number, but doesn't
7     type: number

This absence is strange, but what makes it bizarre is that GitHub is
inconsistent about where types can appear in actions and workflows:

  * workflow_call supports type with boolean, number, or string
  * workflow_dispatch supports type with boolean, choice, number, or
    string
  * Action inputs: no types at all

Unfortunately, this is only the first level: even inputs that do
support typing doesn't support compounded data structures, like lists
or objects. For example, neither of the following works:

1 - uses: example/example
2   with:
3     # INVALID: can't use arrays as inputs
4     paths: [foo, bar, baz]
5     # INVALID: can't use objects as inputs
6     headers:
7       foo: bar
8       baz: quux

...which means that action writers end up requiring users to do silly
things like these:

1 - uses: example/example
2   with:
3     # SILLY: action does ad-hoc CSV-ish parsing
4     paths: foo,bar,baz
5     # SILLY: action forcefully flattens a natural hierarchy
6     header-foo: bar
7     header-baz: quux

This is bad for maintainability, and bad for security:
maintainability because actions must carefully manage a single flat
namespace of inputs (with no types!), and security because both
action writer and workflow writer are forced into ad-hoc, unspecified
languages for complex inputs.

Fixing this

  * Let action and workflow writers use type: everywhere, and let us
    use choice everywhere -- not just in workflow_dispatch!

  * Give us stricter type-checking. Where action and workflow types
    can be inferred statically, detect errors and reject incorrectly
    typed workflow changes at push time, rather than waiting for the
    workflow to inevitably fail.

  * Give us type: object and type: array types. These won't be
    perfect to start with (thanks to potentially heterogeneous
    interior types), but they'll be a significant improvement over
    the status quo. Implementation-wise, forward these as
    JSON-serialized strings or something similar^11 where appropriate
    (such as in auto-created INPUT_{WHATEVER} environment variables).

(More) official actions would be nice

The third-party ecosystem on GitHub Actions is great: there are a lot
of high-quality, easy-to-use actions being maintained by open source
contributors. I maintain a handful of them!

Beneath the surface of these excellent third-party actions is a
substrate of official, GitHub-maintained actions. These actions
primarily address three classes of fundamental CI/CD activities:

 1. Core git operations: actions/checkout
 2. Core GitHub operations and repository housekeeping: actions/
    {upload,download}-artifact, actions/cache, actions/stale
 3. General (but essential) configuration: actions/setup-python,
    actions/setup-node

These classes are somewhat distinct from "higher-level" workflows
(like the kind I write): because of their centrality and universal
demand, they benefit from singular, high-quality, officially
maintained implementations.

And so, the question: why are there so few of them?

Here is just a smattering of the official actions that don't exist:

 1. Programmatically adding a pull request to a merge queue. GitHub
    has the machinery to support this: gh pr merge already exists. It
    just isn't exposed as an action; users are (presumably) expected
    to piece it together themselves.

Even worse, there are actions that did exist but were deprecated
(generally for unclear reasons^12):

 1. actions/create-release: unmaintained as of March 2021. Users
    encouraged to switch to various community maintained workflows,
    most notably^13 softprops/action-gh-release.
 2. actions/upload-release-asset: marked as unmaintained at the same
    time as actions/create-release.
 3. actions/setup-ruby: unmaintained as of February 2021. Users
    encouraged to switch to ruby/setup-ruby.

I'm sympathetic to the individual maintainers here and, in each case,
the transition to a "recommended" third-party action was relatively
painless.

Still, the overall impression given here is unmistakable: that GitHub
does not see official actions for its own platform features (or key
ecosystem users, like Ruby) as priorities, and would rather have the
community develop and choose unofficial favorites. This is not
unreasonable on a strategic level (it induces third-party development
in their ecosystem), but has a deleterious effect on trust in the
platform. I'd like to be able to write workflows and know that
they'll run (with minimal changes) 5 years from now, and not worry
that GitHub has abandoned core pieces underneath me!

Apart from imparting a general feeling of shabbiness, this compounds
with GitHub Action's poor security story (per above): not providing
official high-quality actions for their own API surfaces means that
users will continue to make exploitable security mistakes in their
workflows. Nobody wins^14.

Fixing this

  * Give us more official actions. As a very rough rule of thumb: if
    a thing directly ties different pieces of GitHub infrastructure
    together and currently needs to be done manually (with REST API
    calls, gh invocations, or whatever else), it probably deserves a
    full official action!

  * Give us more pseudo-official actions. Work with the biggest
    third-party actions^15 to form a community-actions (or whatever)
    org, with the expectation that actions homed under that org have
    been reviewed (at some point) by GitHub, are forced to adhere to
    best practices for repository security, receive semantically
    versioned updates, &c &c.

Wrap-up

This is a long and meandering post, and many parts are in conflict:
security and stability (in the form of more official actions that
break less often), for example, are in eternal conflict with each
other.

I'm just one user, and I don't expect my interests or frustrations to
be overriding ones. Still, I hope that the problems (and potential
fixes) above aren't unique to me, and that there are engineers at
GitHub who (again, selfishly!) share these concerns and would like to
see them fixed.

---------------------------------------------------------------------

 1. In a large part because, at GitHub's size, I worry much less
    about private equity enshittifying it. -

 2. Just enough for it to really hurt, against the backdrop of GitHub
    Actions' overall productivity benefits. -

 3. In the sense that these things would be in GitHub's own
    self-interest, making GHA even more appealing to developers,
    further cement its dominance in the CI/CD space, &c. They should
    do these things for their own sake! -

 4. After finishing this post, I discovered that GitHub has a public
    roadmap for Actions features. Maybe some of my grievances are
    already known and listed here; it's a big roadmap! -

 5. Completely unrelated to this post: writing ${{ ... }} is
    remarkably painful in a Liquid-rendered Jekyll blog. -

 6. Yes, I know this fundamentally breaks the GitHub Actions data
    model; I didn't say it would be easy! -

 7. In the sense that "using pull_request_target safely" means being
    confident that you never accidentally run anything from the pull
    request that just triggered your workflow. -

 8. And I think haven't been done to me. -

 9. Which I stole the actions/checkout example from, since I was too
    lazy to make my own. -

10. Even better, make it the default, and require people to click
    through a "destructive action" modal similar to the ones for
    other dangerous user or repository setting changes. -

11. JSON is a semi-obvious choice here, since GitHub Actions already
    has a fromJSON(...) function and maps cleanly from YAML. -

12. The primary stated reason is time, leading to the revelation that
    these critical actions were side projects. That isn't these
    engineers' fault; they seem to have been making the best out of a
    bad situation! But it's incredible to see GitHub,
    organizationally, squander so much value and community goodwill
    here. -

13. In my opinion. It seems to have the most users and most activity,
    although it's bonkers that I'm evaluating something as critical
    as this based on those kind of weak proxy signals. -

14. Except for the pentesting industrial complex. -

15. Off the top of my head: actions like ruby/setup-ruby,
    shivammathur/setup-php, and peaceiris/actions-gh-pages (among
    others) have hundreds of thousands of active users, and form a
    critical part of the Actions ecosystem. They should be treated as
    such! -

---------------------------------------------------------------------
Previously