[HN Gopher] Parsing Protobuf Definitions with Tree-sitter
___________________________________________________________________
Parsing Protobuf Definitions with Tree-sitter
Author : PaulHoule
Score : 35 points
Date : 2024-08-03 19:13 UTC (3 hours ago)
(HTM) web link (relistan.com)
(TXT) w3m dump (relistan.com)
| cyberax wrote:
| I don't get it. Why not just use a better Protobuf model? Go's
| serialization format for protobufs is not the most brilliant one,
| but it's reasonable.
|
| E.g. just use `string` instead of `StringValue`.
| MathMonkeyMan wrote:
| I need to get around to playing with tree-sitter. The approach in
| this article is neat.
|
| Here's another approach. The AST of a .proto file is itself a
| protobuf. That's how the codegen plugins work. Protobuf also has
| a canonical mapping to JSON, so...
|
| What you can do is use protoc to parse the .proto file, spit it
| out as JSON, and then process that data using your favorite
| pattern matching language. I wrote a [tool][1] that helps with
| that. For example, here's some [js code][2] that translates
| protobuf message definitions into "types" for use in an ORM.
|
| [1]: https://github.com/dgoffredo/protojson
|
| [2]:
| https://github.com/dgoffredo/okra/blob/master/lib/proto2type...
| superb_dev wrote:
| Oh my god... this might've made some tools I'm developing a lot
| easier
| fizx wrote:
| Writing a protoc plugin would have been 5x easier, but its
| harder to get a blog article out of it.
|
| Also, this reads like they might not have seen the newer proto3
| optional keyword, or know about the well-known wrapper types.
| grumbles wrote:
| Huh. tree-sitter seems neat, but I don't really get why the
| author thinks processing the descriptor set is so hard. Seems
| equally difficult to learn a bunch of new abstractions in the
| form of tree-sitter vs just learning protobuf's own ones.
|
| Also, if you're parsing .proto files directly, you have to deal
| with a bunch of annoying issues like include paths, how you
| package sets of them to move around, etc. descriptor sets seem
| like a better solution to me.
| pcj-github wrote:
| From the docs "The protocol compiler can output a
| FileDescriptorSet containing the .proto files it parses."
| (https://github.com/protocolbuffers/protobuf/blob/main/src/go...)
|
| I don't understand the point of using tree-sitter to repeat that
| work (almost certainly having bugs doing so). Am I missing
| something?
| Arainach wrote:
| Like others, I don't understand the author's issues getting the
| stock proto reflection behavior to extract this information.
|
| I'm not as familiar with the Go reflection tools, but getting the
| information the author wants is trivial in Java reflection.
___________________________________________________________________
(page generated 2024-08-03 23:00 UTC)