hngopher.com

       [HN Gopher] Unexpected security footguns in Go's parsers
       ___________________________________________________________________
        
       Unexpected security footguns in Go's parsers
        
       Author : ingve
       Score  : 152 points
       Date   : 2025-06-18 11:35 UTC (3 days ago)
        
 (HTM) web link (blog.trailofbits.com)
 (TXT) w3m dump (blog.trailofbits.com)
        
       | anitil wrote:
       | This was all very interesting, but that polyglot json/yaml/xml
       | payload was a big surprise to me! I had no idea that go's default
       | xml parser would accept proceeding and trailing garbage. I'd
       | always thought of json as one of the simpler formats to parse,
       | but I suppose the real world would beg to differ.
       | 
       | It's interesting that decisions made about seemingly-innocuous
       | conditions like 'what if there are duplicate keys' have a long
       | tail of consequences
        
       | neuroelectron wrote:
       | Been seeing these same problems in services for decades now. It's
       | almost like they made these protocol languages exploitable on
       | purpose.
        
         | v5v3 wrote:
         | Indeed...
        
         | CactusRocket wrote:
         | I think it's just kinda dumb parsing. E.g. JSON is an extremely
         | simple spec. Most of those issues that the Go JSON parser has,
         | are because of specific choices of the Go implementation, not
         | about JSON. The fact that it allows case-insensitive key
         | matching is just insane. Also that it parses invalid XML
         | documents (with garbage) into valid structs without returning
         | an error is very much a problem with the parser and not with
         | XML.
        
           | cesarb wrote:
           | > The fact that it allows case-insensitive key matching is
           | just insane.
           | 
           | It's probably a side effect of what is IMO another bad design
           | of that language: letter casing determining field visibility,
           | instead of using a keyword or a sigil. If your field has to
           | be named "User" to be public, and the corresponding entry in
           | the JSON has all-lowercase "user" as the key (probably
           | because the JSON was defined first, and most languages have
           | "field names start with lowercase" as part of their naming
           | conventions), you have to either ignore case when matching,
           | or manually map every field. They probably wanted to be
           | "intuitive" and not require manual mapping.
        
       | e_y_ wrote:
       | As someone who isn't a Go programmer, on the face of it using
       | strings (struct tags) for field metadata seems pretty backwards
       | compared to Rust macros (which parses the metadata at compile
       | time) or Java annotations (which are processed at runtime but at
       | least don't require parsing a string to break apart options).
       | 
       | The accidental omitempty and - are a good example of the
       | weirdness even if they might not cause problems in practice.
        
         | kjksf wrote:
         | For some it's stupidity. For others it's brilliance.
         | 
         | It's one of many examples of 80/20 design in Go: 80% of
         | functionality with 20% of complexity and cost.
         | 
         | Struct tags address an important scenario in an easy to use
         | way.
         | 
         | But they don't try to address other scenarios, like annotations
         | do. They are not function tags. They're not variable tags. They
         | are not general purpose annotations. They are annotations for
         | struct fields and struct fields only.
         | 
         | Are they are as powerful as annotations or macros? Of course
         | not, not even close.
         | 
         | Are they as complex to implement, understand, use? Also not.
         | 
         | 80/20 design. 80% of functionality at 20% of cost.
        
           | Philpax wrote:
           | Go's simplifications often introduce complexities elsewhere,
           | however, as this article demonstrates with the complexities
           | of correctness of a stringly-typed DSL.
           | 
           | There's no free lunch here, and the compromises Go makes to
           | achieve its outcomes have shown themselves to be error-prone
           | in ways that were entirely predictable at design time.
        
             | Kamq wrote:
             | > Go's simplifications often introduce complexities
             | elsewhere
             | 
             | It does occasionally, although I'll push back on the
             | "often". Go's simplifications allow most of the codebase to
             | be... well... simple.
             | 
             | This does come at the cost of some complexity on the edge
             | cases. That's a trade off I'm perfectly willing to make.
             | The weird parts being complex is something I'm willing to
             | accept in exchange for the normal parts being simple, as
             | opposed to constantly dealing with a higher amount of
             | complexity to make the edge cases easier.
             | 
             | > There's no free lunch here
             | 
             | This I'll agree with as well. The lunch is not free, but
             | it's very reasonably priced (like one of those hole in the
             | wall restaurants that serves food way too good for what you
             | pay for it).
             | 
             | > the compromises Go makes to achieve its outcomes have
             | shown themselves to be error-prone in ways that were
             | entirely predictable at design time.
             | 
             | I also agree here, although I see this as a benefit. The
             | things that are error prone are clear enough that they can
             | be seen at design time. There's no free lunch here either,
             | something has to be error prone, and I like the trade offs
             | that go has made on which parts are error prone. Adding
             | significant complexity to reduce those error prone places
             | has, in my experience, just increased the surface area of
             | the error prone sections of other languages.
             | 
             | Could you make the case that some other spot in design
             | space is a better trade-off? Absolutely, especially for a
             | particular problem. But this spot seems to work really well
             | for ~95% of things.
        
             | valenterry wrote:
             | > Go's simplifications often introduce complexities
             | elsewhere
             | 
             | Exactly this.
             | 
             | Basically: have a complex compression algorithm? Yes, it's
             | complex, but the resulting filesize (= program complexity)
             | will be low.
             | 
             | If you use a very basic compression algorithm, it's easier
             | the understand the algorithm, but the filesize will be much
             | bigger.
             | 
             | It's a trade-off. However, as professionals, I think we
             | should really strive to put time to properly learn the good
             | complex compression algorithm once and then benefit for all
             | the programs we write.
        
               | maleldil wrote:
               | > I think we should really strive to put time to properly
               | learn
               | 
               | [insert Pike's Google young programmers quote here]
               | 
               | That's just not the philosophy of the language. The
               | convention in Go is to be as obvious as possible, at the
               | cost of more efficient designs. Some people like it,
               | others don't. It bothers me, so I stopped using Go.
        
           | timeon wrote:
           | Starting to get feeling that 80/20 design is not good thing.
           | Many things seems to be driven by worse is better but,
           | looking at things like Climate Change... was it worth it?
        
         | reactordev wrote:
         | struct tags greatly reduce the boilerplate code required to map
         | fields to fields. It's really quite novel once you understand
         | it.
        
           | masklinn wrote:
           | > struct tags greatly reduce the boilerplate code required to
           | map fields to fields.
           | 
           | Are you somehow under the impression that Go is unique in
           | having a terse way to map fields to fields?
           | 
           | > It's really quite novel once you understand it.
           | 
           | It's the opposite of novel, putting ad-hoc annotations in
           | unstructured contexts is what people used to do before java
           | 5.
        
             | reactordev wrote:
             | No, not at all, but considering C has no such thing, I'll
             | take it.
        
           | jlouis wrote:
           | It's not very novel. There's far better ways of solving this
           | than allowing a random string to be embedded as aux
           | information to a struct field. Examples: F# type providers,
           | or OCamls PPX system for extending the language in a well
           | defined way. Macro rewriting systems also allow for better
           | safety in this area.
           | 
           | This allows you to derive a safe parser from the structural
           | data, and you can make said parser be really strict. See
           | e.g., Wuffs or Langsec for examples of approaches here.
        
             | reactordev wrote:
             | I'm not disagreeing that there are better ways to solve
             | this given how other languages have implemented theirs but
             | considering the constraints they had at the time the Go
             | team designed this, it allowed them to implement marshaling
             | fairly easily and leaves it open for extensions by the
             | community.
        
               | maleldil wrote:
               | > considering the constraints they had at the time the Go
               | team designed this
               | 
               | What constraints? Ignoring decades of programming
               | language developments since C89?
        
         | xnorswap wrote:
         | As a .net programmer, the "stringly typed" nature of the
         | metadata horrifies me, but the choices of Go have long confused
         | me.
         | 
         | So in .NET, like Java as you mention, we have attributes, .
         | 
         | e.g.                   [JsonPropertyName("username")]
         | [JsonIgnore]
         | 
         | etc.
         | 
         | This is simple, and obvious. The JsonPropertyName attribute is
         | an override, you can set naming policies for the whole class.
         | camelCase by default, with kebab-case, snake_case etc as
         | alternative defaults.
         | 
         | C#/.NET of course has the benefit of having public properties,
         | which are serialised by default, and private properties, which
         | aren't, so you're unlikely to be exposing things you don't want
         | to expose.
         | 
         | This contrasts to Go's approach, much like python, of using
         | casing convention to determine private vs public fields. (
         | Please correct me if I'm wrong on this? )
         | 
         | The first example still confuses me though, because either you
         | want IsAdmin to come from the user, in which case you still
         | want to deserialise it, or you don't, in which case it
         | shouldn't even be in your DTO at all.
         | 
         | Deserialisation there is a bit of a red-herring, as there
         | should be a validation step which includes, "Does this user
         | have the rights to create an admin?".
         | 
         | The idea of having a user class, which gets directly updated
         | using properties straight from deserialized user input, feels
         | weird to me, but I'd probably be dismissed as an "enterprise
         | programmer" who wants to put layers between everything.
        
           | grey-area wrote:
           | Not just weird, it's dangerous to do this - to easy to miss
           | validation as fields are added over time. Better to
           | explicitly validate all fields IMO.
        
           | hmry wrote:
           | > This contrasts to Go's approach, much like python, of using
           | casing convention to determine private vs public fields. (
           | Please correct me if I'm wrong on this? )
           | 
           | I think calling it a convention is misleading.
           | 
           | In Python, you can access an _field just by writing
           | obj._field. It's not enforced, only a note to the user that
           | they shouldn't do that.
           | 
           | But in Go, obj.field is a compiler error. Fields that start
           | with a lowercase letter really are private, and this is
           | enforced.
           | 
           | So I think it's better to think of it as true private fields,
           | just with a... unique syntax.
        
           | masklinn wrote:
           | > This contrasts to Go's approach, much like python, of using
           | casing convention to determine private vs public fields. (
           | Please correct me if I'm wrong on this? )
           | 
           | Go actually ties visibility to casing, instead of using
           | separate annotations. And it will not serialise private
           | fields, only public.
           | 
           | Python has no concept of visibility at all, _conventionally_
           | you should not access attributes prefixed with `_` but it won
           | 't stop you.
        
             | maleldil wrote:
             | > conventionally you should not access attributes prefixed
             | with `_` but it won't stop you.
             | 
             | Any serious Python project will use at least one linter or
             | typechecker, which can easily enforce this.
        
         | grey-area wrote:
         | Yes they are a horrible idea for many reasons, not just
         | security. It's like a hidden ill-defined poorly understood dsl
         | in strings.
         | 
         | You can just not use them though - you can unmarshal to a map
         | instead and select the keys you want, perform validation etc
         | and then set the values.
         | 
         | Same when publishing - I prefer to have an explicit view which
         | defines the keys exposed rather than than publishing all by
         | default based on these poorly understood string keys attached
         | to types.
        
         | liampulles wrote:
         | As someone who is regular Go programmer, yes struct tags do
         | suck. It gets even worse if you want to try and combine
         | multiple "annotations" into one struct tag string.
         | 
         | The reason its like that is that Go philosophically is very
         | much against the idea of annotations and macros, and very
         | strongly about the idea of a clear upfront control flow, and
         | this is one of the reasons I love the language. But it does
         | come at the cost of a few highly useful usecases for
         | annotations (like mapping JSON and XML, etc.) becoming obtuse
         | to use.
         | 
         | The idea of more compile-time macros in Go is interesting to
         | me, but at the same time the ease of debugging and
         | understanding the Go control flow in my programs is one of the
         | reasons I love it so much, and I would not want to invite the
         | possibility of "magic" web frameworks that would inevitably
         | result from more metaprogramming ability in Go. So I guess I'm
         | prepared to live with this consequence. :/
        
           | valenterry wrote:
           | I understand your feeling. There are many magical frameworks
           | like e.g. Spring that do these things and it's super hard to
           | figure out what's going on.
           | 
           | The solution is usually to have an _even_ better language.
           | One, where the typesystem is so powerful, that such hacks are
           | not necessary. Unfortunately, that also means you have to
           | learn that typesystem to be productive in language, and you
           | have to learn it more or less upfront - which is not
           | something that Google wanted for golang due to the turnover.
        
           | masklinn wrote:
           | > The reason its like that is that Go philosophically is very
           | much against the idea of annotations and macros, and very
           | strongly about the idea of a clear upfront control flow
           | 
           | Annotations have no control flow, they just attach metadata
           | to items. The difference with struct tags being that that
           | metadata is structured.
        
       | octo888 wrote:
       | What is "IsAdmin" doing in the "create user" request DTO in the
       | first place? The examples seem to indicate inappropriate re-use
       | of data models.
       | 
       | Would it not be better to:                 type CreateUserRequest
       | struct {         Username string         Password string       }
       | type UserView struct {         Username string         IsAdmin
       | boolean       }
       | 
       | etc?
       | 
       | No need to just have just 1 model that maps 1:1 to your DB row.
       | This applies to all languages
        
         | delusional wrote:
         | > No need to just have just 1 model that maps 1:1 to your DB
         | row. This applies to all languages
         | 
         | One reason is to avoid copying data constantly. I don't just
         | mean this from an efficiency perspective, but also (and maybe
         | more so) from a simplicity one. If you have a library for
         | shoving data into a struct mechanistically, but you then take
         | the data from that struct and shove it into an additional
         | struct, what's the point of the library? You're writing the
         | code move the data anyway.
         | 
         | In my dayjob I see this tendency constantly to have a lot of
         | different very narrow structs that somehow integrate into some
         | library, and then a TON of supporting code to copy between
         | those structs. Only to then do very little actually useful work
         | with any of the data at the end. I generally think you'd be
         | happier with fatter structs that integrated less with weird
         | "struct-filling" libraries.
        
           | masklinn wrote:
           | > In my dayjob I see this tendency constantly to have a lot
           | of different very narrow structs that somehow integrate into
           | some library, and then a TON of supporting code to copy
           | between those structs.
           | 
           | Maybe that's the problem to solve, rather than exposing the
           | entire internal world to the outside? Because different views
           | of the same entities is pretty critical otherwise it's way
           | too easy to start e.g. returning PII to public endpoints
           | because some internal process needed it.
        
             | delusional wrote:
             | > exposing the entire internal world to the outside
             | 
             | That's not at all what I said.
             | 
             | You don't need a struct to avoid exposing internal data. If
             | you're building a JSON object, you can just not write the
             | code to format some fields out. You don't need a new data
             | layout for that.
        
               | masklinn wrote:
               | > If you're building a JSON object, you can just not
               | write the code to format some fields out.
               | 
               | Did you fail to read the article somehow?
               | 
               | Few if any modern language requires you to write the code
               | to format individual fields out. Even Go does not, though
               | when it comes to json most fields need to be annotated
               | because of the choices it made to tie casing and
               | visibility, but then it doesn't even require you to opt
               | the type itself into being serializable, every struct is
               | serializable by default.
               | 
               |  _Skipping_ fields on a serializable structure is what
               | requires extra work.
        
               | CactusRocket wrote:
               | That's the whole point of the first item in the article,
               | and the original comment you were replying to. In Go (and
               | some other languages) that formatting is implicit and
               | automatic. So you need to write to code to NOT format the
               | fields out. Which leads to the potential security issues
               | where data is leaked, or you can inject data into
               | "hidden" fields. So since it's implicit and automatic,
               | it's safer to, as a rule, define separate structs for the
               | data input and map them, so that there is absolutely no
               | chance (implicit or explicit) to leak or inject data.
        
           | TeMPOraL wrote:
           | > _also (and maybe more so) from a simplicity one. If you
           | have a library for shoving data into a struct
           | mechanistically, but you then take the data from that struct
           | and shove it into an additional struct, what 's the point of
           | the library? You're writing the code move the data anyway._
           | 
           | Super annoying if you need to do it by hand, and wastes
           | compute and memory if you actually need to do copies of
           | copies, but _this is the mapping part_ of  "object relational
           | mapping", the M in ORM. Skipping it is a bad idea.
           | 
           | Your business/domain model should _not_ be tied directly to
           | your persistence model. It 's a common mistake that's
           | responsible for like half of the bad rep ORMs get. Data
           | structures may look superficially similar, but they represent
           | different _concepts_ with different semantics and
           | expectations. If you skip on that, you 'll end up with tons
           | of stupid mistakes like 'masklinn mentions, and more subtle
           | bugs when the concepts being squashed together start pulling
           | in opposite directions over time.
        
           | homebrewer wrote:
           | I don't know about Go, but in Java and .NET world this is
           | trivially solvable with libraries like MapStruct. If you have
           | a model with 20 fields and need to create a tiny slice of it
           | (with let's say three fields), you need a few lines of
           | boilerplate: create a record with those fields (1-3 LOC):
           | record Profile(int id, String login, boolean isAdmin) {}
           | 
           | create a mapper for it:                 interface UserMapper
           | {         // arrays are just one example, plain models and
           | plenty         // of other data structures are supported
           | Profile[] usersToProfiles(User[] user);              // other
           | mappers...       }
           | 
           | and then use it:                 class UserController {
           | //         @GET("/profiles")         Profile[]
           | getUserProfiles() {           var users =
           | userRepo.getUsers();           return
           | userMapper.usersToProfiles(users);         }       }
           | 
           | As long as fields' names match, everything will be handled
           | for you. Adding another "view" of your users requires
           | creating that "view" (as a record or as a plain class) and
           | adding just one line to the mapper interface, even if that
           | class contains all User's fields but one. So no need to write
           | and maintain 19+ lines of copying data around.
           | 
           | It also handles nested/recursive entities, nulls, etc. It's
           | also using codegen, not reflection, so performance is exactly
           | the same as if you had written it by hand, and the code is
           | easy to read.
           | 
           | https://mapstruct.org
           | 
           | Go developers usually "don't need these complications", so
           | this is just another self-inflicted problem. Or maybe it's
           | solved, look around.
        
             | kgeist wrote:
             | >you have a model with 20 fields and need to create a tiny
             | slice of it (with let's say three fields), you need a few
             | lines of boilerplate: create a record with those fields
             | (1-3 LOC):
             | 
             | >create a mapper for it:
             | 
             | > ...
             | 
             | >Go developers usually "don't need these complications", so
             | this is just another self-inflicted problem.
             | 
             | In Go:                 type DTO struct {         A, B, C
             | string       }
             | 
             | Somewhere in your API layer:                 // copy the
             | fields to the DTO       return DTO{A: o.A, B: o.B, C: o.C}
             | 
             | I fail to see where the "self-inflicted problem" is and why
             | it requires a whole library? (which seems to require around
             | the same number of lines of code at the end of the day, if
             | you count the imports, the additional mapper interface)
        
             | jerf wrote:
             | Actually it's even easier in Go with struct embedding:
             | type SmallerThing struct {             Id     int
             | Login  string             IsAdmin bool         }
             | type UserController struct {             SmallerThing
             | OtherField  Whatever             OtherField2 SomethingElse
             | }
             | 
             | In principle this could break down if you need super, super
             | complicated non-overlapping mappings, in practice I have
             | yet to need that.
        
           | kgeist wrote:
           | 1. As shown in the article, exposing the internal model to
           | APIs directly is a security footgun. I've seen sites get
           | hacked because the developer serialized internal objects
           | without any oversight just like in the article and
           | accidentally exposed secrets.
           | 
           | 2. Exposing internal models to APIs directly also makes it
           | hard to refactor code because refactoring would change APIs,
           | which would require updating the clients (especially
           | problematic when the clients are owned by other teams). I've
           | seen this firsthand too in a large legacy project, people
           | were afraid to refactor the code because whenever they tried,
           | it broke the clients downstream. So instead of refactoring,
           | they just added various complex hacks to avoid touching the
           | old core code (and of course, their models also mapped
           | directly to the UI).
           | 
           | In the end, codebases like that, with no layer separation,
           | become really hard to maintain and full of security problems.
           | 
           | All because they thought it was "simpler" to skip writing ~10
           | lines of extra boilerplate per model to map models to DTOs.
           | 
           | Lack of layer separation becomes a problem in the long term.
           | When you're just starting out, it may seem like
           | overengineering, but it isn't
        
             | delusional wrote:
             | > Lack of layer separation becomes a problem in the long
             | term. When you're just starting out, it may seem like
             | overengineering, but it isn't
             | 
             | I actually agree, but you're setting of a false dichotomy.
             | I do believe in strong layering at the interfaces, for
             | exactly the reasons you line up. What I don't believe in is
             | what I might call "struct annotation based parsing" at
             | those interfaces.
             | 
             | Typically, you don't want to pass DTO's around your code.
             | Usually, you take in that struct, and then immediatly have
             | some code to poke it into the appropriate places in your
             | actual data structures. It's very often much easier to
             | simply take a well structured but more direct and generic
             | interpretation of the input data, and write the code to
             | poke it into the correct places directly.
             | 
             | It is not that you should define your inputs separately
             | from your internal data storage. It's that the
             | specification of your input structure shouldn't exist as a
             | struct, it should exist as the consequence of your parsing
             | code.
             | 
             | > When you're just starting out, it may seem like
             | overengineering, but it isn't
             | 
             | It's a real shame that the internet has driven us to assume
             | everybody is a novice.
        
               | kgeist wrote:
               | >It's a real shame that the internet has driven us to
               | assume everybody is a novice.
               | 
               | Sorry, English is not my native language. I didn't mean
               | to say you're a novice.
               | 
               | >Usually, you take in that struct, and then immediatly
               | have some code to poke it into the appropriate places in
               | your actual data structures
               | 
               | >It's that the specification of your input structure
               | shouldn't exist as a struct, it should exist as the
               | consequence of your parsing code.
               | 
               | >It is not that you should define your inputs separately
               | from your internal data storage. It's that the
               | specification of your input structure shouldn't exist as
               | a struct, it should exist as the consequence of your
               | parsing code.
               | 
               | Can you give an example?
        
               | delusional wrote:
               | > Can you give an example?
               | 
               | Sure. I like taking Jackson (the Java library) as an
               | example, since it actually supports both models. The way
               | I've seen it used mostly is with jackson-databind. Here
               | you define classes and annotate the fields with data that
               | tells the library how to marshall them to/from json.
               | Superficially, I find that similar to how Go or SerDe
               | (from rust) suggests you handle that. In that programming
               | model, I agree it makes total sense to declare some
               | classes separately from your core structures, for all the
               | reasons we've talked about.
               | 
               | The other model Jackson has is what they call the Jackson
               | Tree Model. In this model you get back a representation
               | of a Json Object. From that object you can get fields,
               | those fields can themselves be objects, arrays, or
               | immediate. It's an AST if you're a compiler person.
               | 
               | The first model might lead to code like this:
               | public class Person {             @JsonProperty
               | String name;             @JsonProperty             String
               | phoneNo;         }
               | 
               | Usually, the annotations wont be able to fully specify
               | the constraints of your code, so you'll see usage code
               | like this:
               | if(personDto.phoneNo.length() != 10) return Http(400,
               | "Phone number must be 10 chars");         person.phone =
               | personDto.phoneNo
               | 
               | With the Tree Model you'd instead get a representation of
               | the raw JSON from the client and pull out the fields you
               | care about yourself:                   var phoneNoObj =
               | jsonBody.get("phoneNo");         if(phoneNoObj == null)
               | return Http(400, "Phone number is required");         var
               | phoneNoStr = phoneNoObj.asString()         if(phoneNoStr
               | == null) return Http(400, "Phone number must be a
               | string");         if(phoneNoStr.length() != 10) return
               | Http(400, "Phone number must be 10 chars");
               | person.phone = phoneNoStr;
               | 
               | Notice that we are now just doing a single _application
               | specific_ parse of the json, and while we were at it we
               | also got to surface a bunch more relevant errors. The
               | Jackson Tree model is obviously pretty inefficient, but
               | there are ways to implement it that makes it more
               | efficient too.
        
               | kgeist wrote:
               | In Go, you can marshal JSON to a map, so you can extract
               | fields manually, too.
        
           | eadmund wrote:
           | > In my dayjob I see this tendency constantly to have a lot
           | of different very narrow structs that somehow integrate into
           | some library, and then a TON of supporting code to copy
           | between those structs. Only to then do very little actually
           | useful work with any of the data at the end.
           | 
           | Don't think of it as doing a little useful work at the _end_
           | ; think of it as doing all the useful work _in the centre_.
           | Your core logic should be as close to a pure implementation
           | without external libraries as possible (ideally zero, but
           | that is often not easily achievable), but call out to
           | external libraries and services to get its work done as
           | appropriate. That does mean a fair amount of data copying,
           | but IMHO it's worth it. Testing copies is easy and localised,
           | whereas understand the implications of a JSON (or Protobuf,
           | or Django, or whatever) object carried deep into one's core
           | logic and passed into other services and libraries is very
           | very difficult.
           | 
           | There's a common theme with the ORM trap here. The cost of a
           | little bit of magic is often higher than the cost of a little
           | bit of copying.
        
         | liampulles wrote:
         | Yeah its a bad pattern. And we can expect to see it in a
         | sizable number of production codebases.
        
           | diggan wrote:
           | ORMs love to put people and projects in the situation where
           | it's much easier for data types in your program to match
           | exactly with the tables in your database, but reality often
           | requires them to be decoupled instead. That "create user"
           | request contains "isAdmin" smells like someone mapped that
           | exact request to a table, and just because the table has
           | "isAdmin", the data type now also has it.
        
         | ajross wrote:
         | Exactly. This isn't a flaw in the runtime or the metaphor, the
         | bug here is that the app exposes private data in a public API.
         | That's a mistake that doesn't care about implementation choice.
         | You can make the same blunder with a RESTful interface or a CGI
         | script.
         | 
         | At most, you can argue that simple serialization libraries
         | (Go's is indeed one of the best) make it more tempting to "just
         | send the data" in such a design, so if you squint really
         | (really) hard, you can call this a "footgun" I guess.
         | 
         | But the rest of the headline is 100% nonsense. This is _not_
         | about  "Go" or "parsers". At all.
        
           | tln wrote:
           | What? It's about parsers in Go's standard library.
        
             | ajross wrote:
             | A "parser" is the piece of software that translates
             | unstructured input (usually text) into structured output
             | that better reflects the runtime of a programming language.
             | A "security bug" in a parser is normally construed to be
             | the existence of an input that causes the software to do
             | something incorrect/undocumented/unexpected.
             | 
             | Nothing in the article discusses a parser or anything like
             | a parser bug.
             | 
             | The article doesn't like that the semantics of the user-
             | facing API wrapped around the parser is, I guess, "easy to
             | make mistakes with". That's an article about API design, at
             | most. But that's boring and specious and doesn't grab
             | clicks, so they want you to think that Go's parsers are
             | insecure instead.
        
               | sfvisser wrote:
               | Exactly right. Better have a domain layer with data types
               | representing the domain object 1:1 and add one or more
               | API layers on top for interacting with those for some
               | modality. Creation, deletion, verification, auth etc.
               | 
               | The security failure is not the parsing library, but
               | failing to model your application architecture properly.
        
               | wavemode wrote:
               | > Nothing in the article discusses a parser or anything
               | like a parser bug.
               | 
               | And it doesn't claim to. The article is titled "footguns"
               | not "bugs". A footgun is just something that is easy to
               | misuse due to unintuitive or unexpected behavior.
        
               | dimgl wrote:
               | > And it doesn't claim to.
               | 
               | Yes it does. The title is literally "Unexpected security
               | footguns in Go's parsers". The article didn't identify a
               | single footgun. This is just bad design.
        
               | mplanchard wrote:
               | The article links to CVEs directly caused by some of the
               | less intuitive behavior here. I feel like that's
               | sufficient to qualify as a footgun
        
               | ajross wrote:
               | That title is clearly constructed to imply that Go's
               | parsers are insecure. The text of the article isn't about
               | parsers at all, and only tangentially about Go[1].
               | 
               | It's deliberately misleading clickbait. You know it. I
               | know it. We all know it.
               | 
               | If you want to have a considered discussion about
               | pitfalls with the use of automatic serialization
               | paradigms across trust boundaries, I'm here for it. If
               | you just want to flame about Go, get better source
               | material. This one isn't the hill to stand on.
               | 
               | [1] Which, again, has a really first rate serialization
               | story; but not one fundamentally different from any of a
               | zillion others. Cooking data from untrusted sources is
               | just plain hard, and not something that anyone (much less
               | the author of this awful blog post) is going to solve
               | with a serialization API.
        
               | ioasuncvinvaer wrote:
               | I disagree.
               | 
               | It was a bad design choice to allow JSON keys with
               | different capitalisation and to serialise all public
               | struct members by default.
               | 
               | These decisions can easily result in the creation of an
               | insecure system.
        
               | woodruffw wrote:
               | I don't think the title is clickbait. You're fixating on
               | just one example in the post, but the others demonstrate
               | a clear pattern: there _are_ footguns in Go's various
               | parsers, including forms of malleability or flexibility
               | that enable parser differentials.
               | 
               | This isn't intended to be a ding against Go; it's a
               | universal problem across programming languages (otherwise
               | there'd be no differentials at all). But they're worth
               | noting, and I think the post amply demonstrates their
               | security relevance.
        
               | x0x0 wrote:
               | Of course it's about Go. They decided to build not just
               | opt-out unmarshaling of objects, but fragile opt-out.
               | 
               | See eg rails' strong params for the opposite approach:
               | opt-in.
        
               | CactusRocket wrote:
               | I think you and I read a different article. For example,
               | about treating JSON keys without case sensitivity:
               | 
               | > In our opinion, this is the most critical pitfall of
               | Go's JSON parser because it differs from the default
               | parsers for JavaScript, Python, Rust, Ruby, Java, and all
               | other parsers we tested.
               | 
               | It would be kind of difficult to argue that this is not
               | about Go.
               | 
               | Don't get me wrong, I love Go just as much as the next
               | person, but this article definitely lays bare some issues
               | with serialization specific to Go.
        
               | woodruffw wrote:
               | I would consider the case-insensitive key handling in
               | Go's JSON parser and the malleability in the XML parser
               | to both be "classic" examples of differential-prone
               | behavior, which is the ultimate point made by the
               | article.
               | 
               | (This is why the more formal definition of a parser is
               | useful to ground on: a parser is a type of _recognizer_ ,
               | and disagreements between recognizers that _claim_ to
               | recognize the same thing can be exploited. This doesn't
               | require a bug per se, only difference, which is why it's
               | a footgun.)
        
         | ljm wrote:
         | Aside from security issues it presents, it's just a really bad
         | idea to tightly couple your data model to your API like that.
         | 
         | Any change to your DB schema is liable to become a breaking
         | change on your API. If you need separate types for your
         | requests and responses so be it.
        
         | matthew16550 wrote:
         | It's a concept from Fieldings REST thesis that is important to
         | the current meaning of REST:
         | 
         | Transfer Objects are not the Storage Objects.
        
       | aintnolove wrote:
       | I know these problems are easily avoidable... but I'm finally
       | starting to see the appeal of protocol buffers.
       | 
       | Just to have the assurance that, regardless of programming
       | language, you're guaranteed a consistent ser/de experience.
        
         | chubot wrote:
         | It would nice if it were true :-)
         | 
         | But it's not, for reasons that have more to do with the
         | languages themselves, than parsing
         | 
         | e.g. C++ numbers are different than Java numbers are different
         | than Python numbers are different than JavaScript numbers
         | 
         | ditto for strings
        
           | CactusRocket wrote:
           | I imagine that one of the points of a solid protocol buffers
           | library would be to align the types even across programming
           | languages. E.g. explicitly force a 64-bit integer rather than
           | "int" relying on the platform. And to have some custom
           | "string" type which is always UTF-8 encoded in memory rather
           | than depending on the platform-specific encoding.
           | 
           | (I have no idea if that is the case with protobuf, I don't
           | have enough experience with it.)
        
             | chubot wrote:
             | Why would you "imagine" that?
             | 
             | Again, the problem has more to do with the programming
             | languages themselves, rather than with protobufs or
             | parsing.
             | 
             | Protobuf has both signed and unsigned integers - the
             | initial use case was C++ <-> C++ communication
             | 
             | Java doesn't have unsigned integers
             | 
             | Python has arbitrary precision integers
             | 
             | JavaScript traditionally only had doubles, which means it
             | can represent integers up to 53 bit exactly. It has since
             | added arbitrary size integers -- but that doesn't mean that
             | the protobuf libraries actually use them
             | 
             | ---
             | 
             | These aren't the only possibilities -- every language is
             | fundamentally different
             | 
             | OCaml has 31- or 63-bit integers IIRC
             | 
             | https://protobuf.dev/programming-guides/encoding/#int-types
             | 
             | And again, strings also differ between all these languages
             | -- there are three main choices, which are basically 8-bit,
             | 16-bit, or 32-bit code units
             | 
             | Go and Rust favor 8-bit units; Java and JavaScript favor
             | 16-bit units; and Python/C/C++ favors 32-bit units (which
             | are code points)
        
               | CactusRocket wrote:
               | I think I didn't explain myself well.
               | 
               | As long as a language has bytes and arrays, you can
               | implement anything on top of them, like unsigned
               | integers, 8-bit strings, UTF-8 strings, UCS-2, whatever
               | you want. Sure it won't be native types, so it will
               | probably be slower and could have an awkward memory
               | layout, but it's possible
               | 
               | Granted, if a language is so gimped that it doesn't even
               | have integers (as you mentioned JavaScript), then that
               | language will not be able to fully support it indeed.
        
               | chubot wrote:
               | Unfortunately that doesn't solve the problem -- it only
               | pushes it around
               | 
               | I recommend writing a protobuf generator for your
               | favorite language. The less it looks like C++, the more
               | hard decisions you'll have to make
               | 
               | If you try your approach, you'll feel the "tax" when
               | interacting with idiomatic code, and then likely make the
               | opposite decision
               | 
               | ---
               | 
               | Re: "so gimped" --> this tends to be what protobuf API
               | design discussion are like. Users of certain languages
               | can't imagine the viewpoints of users of other languages
               | 
               | e.g. is unsigned vs. signed the way the world is? Or an
               | implementation detail.
               | 
               | And it's a problem to be MORE expressive than C/C++ --
               | i.e. from idiomatic Python code, the protobuf data model
               | also causes a problem
               | 
               | Even within C/C++, there is more than one dialect -- C++
               | 03 versus C++ 11 with smart pointers (and probably more
               | in the future). These styles correspond to the protobuf
               | v1 and protobuf v2 APIs
               | 
               | (I used both protobuf v1 and protobuf v2 for many years,
               | and did a design review for the protobuf v3 Python API)
               | 
               | In other words, protobufs aren't magic; they're another
               | form of parsing, combined with code generation, which
               | solve some technical problems, and not others. They also
               | don't resolve arguments about parsing and serialization!
        
         | liampulles wrote:
         | I think you can get similar benefits here from writing an RPC
         | style JSON API into an OpenAPI spec and generating structs and
         | route handlers from that. That's what I do for most of my Go
         | projects anyway.
        
       | em-bee wrote:
       | i am not aware of any parser that does that differently, but i
       | would also argue that this is not the job of parsers. after
       | parsing (or before exporting) there should be a data validation
       | step based on whitelists.
       | 
       | so the user can send in unknown fields all they want, the code
       | will only accept the username and firstname strings, and ignore
       | the other ones.
       | 
       | same with fetching data and sending it to the user. i fetch only
       | the fields i want and create the correct datastructures before
       | invoking the marshaling step.
       | 
       | there are no footguns. if you expect your parser to protect you
       | you are using it wrong. they were not designed for that.
       | 
       | input -> parse -> extract the fields we want, which are valid ->
       | create a data-structure with those fields.
       | 
       | data -> get fields i want -> create datastructures with only
       | wanted fields -> write to output format
        
         | securesaml wrote:
         | This is correct. In blog post they say: > Other examples exist,
         | but most follow the same pattern: the component that does
         | security checks and the component that performs the actions
         | differ in their view of the input data.
         | 
         | This would be solved (as you described), by ensuring that the
         | downstream layer uses only contents that are verified in the
         | security check layer.
         | 
         | If they are using a microservice then: Security check API ->
         | return verified data (i.e. re-serialize the verified JSON or
         | XML into byte form, NOT the original input) -> Processing layer
         | i.e. userCreate API uses verified data.
         | 
         | This is the method we used in fixing the ruby-saml example.
         | 
         | See:
         | https://bsky.app/profile/filippo.abyssdomain.expert/post/3le...
        
       | fainpul wrote:
       | I didn't see this mentioned in the article: wouldn't the obvious
       | way be to make the private field private (lowercase)?
       | 
       | (I'm not a Go developer, just tried the language casually).
       | type User struct {          Username string
       | `json:"username_json_key,omitempty"`          Password string
       | `json:"password"`          isAdmin  bool         }
       | 
       | https://go.dev/play/p/1m-6hO93Xce
        
         | never_inline wrote:
         | You can annotate `json:"-"` which is equivalent of @JsonIgnore
        
           | zimpenfish wrote:
           | Covered in the article along with a potential snafu - adding
           | anything else (`-,omitempty`) turns it into a key "-", not
           | the "ignore this field" indicator.
        
         | arp242 wrote:
         | That works, but then you also can't access isAdmin from another
         | package.
        
           | pdq wrote:
           | You can just make an IsAdmin() public function to expose it.
        
         | zimpenfish wrote:
         | > wouldn't the obvious way be to make the private field private
         | (lowercase)?
         | 
         | That may break other things - `gorm`, for example, will ignore
         | private fields - inconvenient if you want to serialise `User`
         | to your DB.
        
           | CactusRocket wrote:
           | If you use the same struct in both an HTTP API and an ORM,
           | you're Doing It Wrong in my opinion. These should be
           | completely separated. Exactly to prevent accidental leaking
           | or injection of data.
        
             | zimpenfish wrote:
             | > If you use the same struct in both an HTTP API and an
             | ORM, you're Doing It Wrong in my opinion.
             | 
             | If you mean "public API", yep, 100% agree. Internal API
             | between microservices though? Perfectly safe and cromulent,
             | I'd say.
        
       | asimops wrote:
       | In the case of Attack scenario 2, I do not get why in a secure
       | design you would ever forward the client originating data to the
       | auth service. This is more of a broken best practise then a
       | footgun to me.
       | 
       | The logic should be "Parse, don't validate"[0] and after that you
       | work on those parsed data.
       | 
       | [0]: https://hn.algolia.com/?q=https%3A%2F%2Flexi-
       | lambda.github.i...
        
         | securesaml wrote:
         | See:
         | https://bsky.app/profile/filippo.abyssdomain.expert/post/3le...
         | that was about a signature wrapping attack in crypto, but it
         | also applies here.
        
       | glenjamin wrote:
       | It's worth noting that if you DisallowUnknownFields it makes it
       | much harder to handle forward/backward compatible API changes -
       | which is a very common and usually desirable pattern
        
       | jerf wrote:
       | There's a lot of good information in here, but if you think this
       | is a time to go all language supremicist about how much better
       | your language is and how this proves Go sucks, you might want to
       | double-check the CVE database real quick first. A lot of these
       | issues are more attributable to plain ol' user error and the
       | fundamental messiness of JSON and XML than Go _qua_ Go and I 've
       | seen many similar issues everywhere.
       | 
       | For instance, there simply isn't a "correct" way for a parser to
       | handle duplicate keys. Because the problem they have is different
       | layers seeing them differently, you can have the problem anywhere
       | duplicate keys are treated differently, and it's not like Go is
       | the only thing to implement "last wins". It doesn't matter what
       | you do. Last wins? Varies from the many "first wins"
       | implementations. First wins? Varies from the many "last wins"
       | implementations. Nondeterministically choose? Now you conflict
       | with everyone, even yourself, sometimes. Crash or throw an
       | exception or otherwise fail? Now you've got a potential DOS.
       | There's no way for a parser to win here, in any langauge. The
       | code using the parser has some decisions to make.
       | 
       | Another example, the streaming JSON decoder "accepts" trailing
       | garbage data because by the act of using the streaming JSON
       | decoder you have indicated a willingness to potentially decode
       | more JSON data. You can use this to handle newline-separated
       | JSON, or other interesting JSON protocols where you're not
       | committing to the string being just one JSON value and absolutely
       | nothing else. It's not "an issue they're not planning on fixing",
       | it's a _feature_ with an absolutely unavoidable side effect in
       | the context of streams. The JSON parser stops reading the stream
       | at the end of the complete JSON object, by design, and anything
       | else would be wrong because it would be _consuming_ a value to
       | "check" whether the next thing is JSON or not, when you may not
       | even be "claiming" that the "next thing" is JSON, and whatever
       | input it consumed to verify a claim that nobody is even making
       | would itself be a bug.
       | 
       | Accepting user input into sensitive variables is a common mistake
       | I've seen multiple times in a number of langauges. The root
       | problem there is more the tension between convenience and
       | security than languages themselves; any language can make it so
       | convenient to load data that developers accidentally load more
       | data than they realize.
       | 
       | Etc. The best lesson to take away from this is that there is more
       | than meets the eye with JSON and XML and they're harder to use
       | _safely_ than its ease-of-use suggests.
       | 
       | Although in the interests of fairness I also consider case
       | insensitivity in the JSON field names to be a mistake; maybe it
       | should be an option, JSON can get messy in the real world, but
       | it's a bad default. I have other quibbles but most of them are
       | things where there isn't a correct answer where you can
       | unambiguously say that some choice is wrong. JSON is really quite
       | fundamentally messier than people realize, and XML, while
       | generally more tightly specified at the grammer level than JSON
       | is, is generally quite messy in the protocols people build on top
       | of it.
        
         | mplanchard wrote:
         | The article didn't claim that "last wins" is in and of itself
         | an issue, but that the differences between who wins between
         | parsers across services/languages can cause issues. Their
         | position was that everyone should standardize on "last wins,"
         | since that is the most common.
        
           | securesaml wrote:
           | The correct conclusion is:
           | https://news.ycombinator.com/item?id=44337330
           | 
           | The problem of trying to ensure that each parser behaves the
           | same for all input is twofold: - JSON and XML specifications
           | are complex, lots of quirks. So not feasible. - Does not
           | solve the fundamental issue of the processing layer not using
           | the same data that is verified in the verification layer.
           | 
           | Note: the processing layer parses the original input bytes,
           | while the verification layer verifies a struct that is parsed
           | using another parser.
           | 
           | Processed: Proc(input) Verified: VerifyingParser(input)
        
         | conradludgate wrote:
         | If a parser throwing an error due to duplicate fields causes a
         | DoS, then the same is going to happen when it throws an error
         | due to invalid encoding...
         | 
         | So, I don't think that's a relevant critique. I think any
         | ambiguous case in parsing untrusted user input should raise an
         | error, and anyone working on code with untrusted data should be
         | ready to handle errors
        
       | piinbinary wrote:
       | Kudos to the author for making this very clear and easy to
       | understand. More technical writing should be like this.
       | 
       | On another note, it's mind-blowing that a single string can parse
       | as XML, JSON, _and_ YAML.
        
       | tptacek wrote:
       | These kinds of issues (parser differentials in particular) are
       | why you shouldn't trust Go SAML implementations that use
       | `encoding/xml`, which was never designed for that application to
       | begin with; I just wrote my own for my SAML.
       | 
       | (I mean, don't use SAML to begin with, but.)
        
         | securesaml wrote:
         | Issue is not with go's parser, but instead about processing
         | layer using different input than verifying layer [1]
         | 
         | We patched the gosaml2 (and other go saml libraries), by
         | ensuring only the authenticated bytes are processed (not the
         | original XML document). You can see the patches here:
         | https://github.com/russellhaering/goxmldsig/commit/e1c8a5b89...
         | https://github.com/russellhaering/gosaml2/commit/99574489327...
         | 
         | > I just wrote my own for my SAML.
         | 
         | Curious to see your implementation for SAML and XML Signatures.
         | 
         | [1]:
         | https://bsky.app/profile/filippo.abyssdomain.expert/post/3le...
        
       | Rikudou wrote:
       | And people keep on hating on PHP... Case-insensitive JSON parsing
       | is another level, one even PHP wouldn't cross.
        
       ___________________________________________________________________
       (page generated 2025-06-21 23:00 UTC)