hngopher.com

       [HN Gopher] Optimizing a Math Expression Parser in Rust
       ___________________________________________________________________
        
       Optimizing a Math Expression Parser in Rust
        
       Author : serial_dev
       Score  : 121 points
       Date   : 2025-07-10 09:27 UTC (13 hours ago)
        
 (HTM) web link (rpallas.xyz)
 (TXT) w3m dump (rpallas.xyz)
        
       | tialaramex wrote:
       | This reminds me I should actually write a "natural" arithmetic
       | expression parser for my Rust crate realistic
       | 
       | Right now, realistic can parse "(* (^ 40 0.5) (^ 90 0.5))" and it
       | will tell you that's 60, because yeah, it's sixty, that's how
       | real arithmetic works.
       | 
       | But it would be nice to write "(40^0.5) * (90^0.5)" or similar
       | and have that work instead or as well. The months of work on
       | realistic meant I spent so long without a "natural" parser that I
       | got used to this.
        
         | thrance wrote:
         | I think having a polish notation parser is good enough for
         | math-y applciations, I wouldn't worry about it too much if I
         | were you. Nice crate by the way!
        
       | ramon156 wrote:
       | This would be a perfect class project. First lesson is letting
       | them go loose, second lessen would be to find out which
       | optimizations they used and which are available
        
       | makapuf wrote:
       | I very much like those optimizations articles , what could be
       | interesting is to look at benchmarks not only wall time but also
       | other metrics :
       | 
       | - cpu time (better CPU usage can mean shorter wall time but
       | higher CPU)
       | 
       | - memory usage
       | 
       | - but also and maybe more interestingly complexity of code (not
       | an absolute metric, but a very complex/non portable code for 5%
       | speedup may or may not be worth it)
       | 
       | EDIT: formatting
        
       | catfacts wrote:
       | I am not even a newbye in Rust and also this could be just
       | nitpicking, but it seems that match is comparing strings and not
       | characters, if this is the case then I think Common Lisp can
       | optimize more, since there is a special comparison for characters
       | in CL.
       | 
       | Edited: In the optimized version the author use bytes and
       | generators and avoid using strings. I don't know if Rust
       | generators are optimized for speed or memory, ideally you could
       | define the length of the buffer according to the memory cache
       | available.
       | 
       | Edited: I find strange using input = read_input_file()? and then
       | using eval(&input), what happens when there is an error reading
       | the file? Rust is supposed to be a high security language. In CL
       | there are keyword like if-does-not-exists to decide what to do
       | and also read accepts additional parameters for end-of-file and
       | for expressing that this read is inside a recursive procedure
       | inside another read.
       | 
       | I should stop comparing Rust to CL, better learn Rust first. I
       | consider this kind of articles a very good way of learning Rust
       | for those interested in parsing and optimizations. Rust seems to
       | be a very nice language when you can afford the time to develop
       | your program.
        
         | VGHN7XDuOXPAzol wrote:
         | If you wanted to match on characters (`char`s) then you could
         | do this with single quotes (`'+'`)
         | 
         | Or if you wanted to do it on bytes, you could also do this,
         | with (`b'+'`).
         | 
         | Unsure if that would provide a meaningful boost or not
        
           | catfacts wrote:
           | Thanks for all the information you provided. I will read Rust
           | by Example and stop posting in this thread to avoid deviating
           | from the OP. Anyway, perhaps other readers are learning Rust
           | and having the same questions in their minds, so your answers
           | are also welcome for them.
           | 
           | Edited: I will eliminate my catfacts username (changing
           | passsord to a random one), I don't like being downvoted and I
           | know I should not mention it, but things are what they are.
           | Good bye catfacts !.
        
           | epage wrote:
           | Likely, comparing on `char` ('+') would be slower as it
           | requires decoding the `&str` as a `char` which comes with
           | some significant overhead (I've seen 9% on a fairly optimized
           | parser). Ideally, when you grammar is 7-bit ASCII (or any
           | 8-bit UTF-8 values are opaque to your grammar), you instead
           | parse on `&[u8]` and do `u8` comparisons, rather than `char`
           | or `&[u8]`.
        
         | VGHN7XDuOXPAzol wrote:
         | > what happens when there is an error reading the file?
         | 
         | the question mark `?` denotes the fact that the error is
         | bubbled up (kind of like an exception, but with stronger typing
         | and less silent)
        
           | catfacts wrote:
           | Thanks for the info. I imagine that in this care, since it
           | seems the error is not captured, it should end producing
           | panic. So a question mark is used when the expected result is
           | of type Result or Error. Also the web page, https://doc.rust-
           | lang.org/rust-by-example/error/result.html, describe the
           | result type as Ok(T) or Err(E), and indicates that is a
           | richer version of Option.
        
             | VGHN7XDuOXPAzol wrote:
             | Yeah, if `main` returns an error I think it exits with an
             | error code and prints it out, so quite similar to a panic.
             | 
             | I think the blog post is not focussing on error handling
             | too much, but in any case this is 'safe', just could likely
             | be handled better in a real-world case.
        
           | tialaramex wrote:
           | Specifically the ? symbol is currently implemented via the
           | operator trait Try as Try::branch() which gets you a
           | ControlFlow
           | 
           | If Try::branch gives us a ControlFlow::Break we're done here,
           | return immediately with the value wrapped by Break [if any]
           | inside an Err, otherwise we have ControlFlow::Continue
           | wrapping a value we can use to continue with execution of
           | this function.
           | 
           | This is type checked, so if the function says it returns
           | Result<Goose, Dalek> then the type of the value wrapped in a
           | ControlFlow::Break had better be Err(Dalek) or else we can't
           | use our ? operator here.
           | 
           | Reifying ControlFlow here separates concerns properly - if we
           | want to stop early _successfully_ then control flow can
           | represent that idea just fine whereas an Exception model ties
           | early exit to failure.
        
         | phi-go wrote:
         | The ? will directly return Err if there is one during
         | read_input_file(). This is just some syntactic sugar.
        
         | biorach wrote:
         | > I should stop comparing Rust to CL, better learn Rust first
         | 
         | Yes
        
       | amelius wrote:
       | Can somebody explain this line:                   n =>
       | Token::Operand(n.parse().unwrap()),
       | 
       | How does the compiler derive the type of n?
        
         | tialaramex wrote:
         | We're doing a pattern match, so, this variable n has to be
         | something that matches the entire value matched, its type will
         | be identical to the type of the value matched, s a few lines
         | earlier.
         | 
         | That value is an item from the iterator we got from calling
         | split_whitespace() and split_whitespace() returns a
         | SplitWhiteSpace, a custom iterator whose items are themselves
         | sub-strings of the input string with (no surprise) no white
         | space in them. In Rust's terminology these are &str, references
         | to a string slice.
         | 
         | So, the type is &str
        
           | amelius wrote:
           | Aha. But what type does n.parse() have then, and how does the
           | compiler derive it?
        
             | ninkendo wrote:
             | That function is returning a Vec<Token>, and so it knows
             | the .collect() call needs to return a Vec<Token>, and so
             | therefore the .map() function needs to return a Token.
             | Therefore each match arm needs to return a Token too, so
             | therefore the compiler selects the implementation of
             | .parse() that returns Token.
             | 
             | I admit when I started rust, seeing calls to .parse() was
             | one of the more confusing things I saw in rust code,
             | because of how much it leans on type inference to be
             | readable. In places like these, it's a bit more readable:
             | let ip: IpAddr = ip_str.parse()?;
             | 
             | But when you see the .parse buried several levels deep and
             | you have no idea what type it's trying to produce, it's a
             | pain in the ass to read. This is why it's nice to use the
             | turbo-fish syntax:                   let ip =
             | ip_str.parse::<IpAddr>()?;
             | 
             | Since you can drop .parse::<IpAddr>()? anywhere to make the
             | type explicit, especially when buried in type-inferred
             | blocks like the code in TFA.
        
             | tialaramex wrote:
             | In _this_ case the compiler actually first wants the type
             | for a parameter to the function Token::Operand
             | 
             | That function is not shown, but it is included in the full
             | source code which was linked. Well, technically we need to
             | know that Rust says if there's a sum type Token::Operand
             | which has an associated value, we can always call a
             | function to make a Token::Operand with that value, and it
             | just names this function Token::Operand too.
             | 
             | So, Token::Operand takes an i32, a 32-bit signed integer.
             | The compiler knows we're eventually getting an i32 to call
             | this function, if not our program isn't valid.
             | 
             | Which means n.parse().unwrap() has the type i32
             | 
             | We know n is an &str, the &str type has a _generic_
             | function parse(), with the following signature:
             | pub fn parse<F>(&self) -> Result<F, <F as FromStr>::Err>
             | where F: FromStr
             | 
             | So the type you now care about, that of n.parse() has to be
             | Result of some kind, and we're going to call
             | Result::unwrap() on that, to get an i32
             | 
             | This can only work if the type F in that generic signature
             | above is i32
             | 
             | Which means the new type you care about was Result<i32,
             | ParseIntError> and the parse function called will be the
             | one which makes Ok(i32) when presented an in-range integer.
             | 
             | Edited: Word-smithing, no significant change of meaning.
        
         | phi-go wrote:
         | Operand(u32) (see definition of Token), n will be parsed as a
         | u32. See here: https://doc.rust-
         | lang.org/std/primitive.str.html#method.pars...
        
         | VGHN7XDuOXPAzol wrote:
         | maybe this isn't the question you meant to ask, but:
         | 
         | `n` has the same type as the input of the `match` block. In
         | other words, it's a fallback case. (In this case, it's `&str`;
         | the same as `"+"`, `"-"`, etc)
         | 
         | If you're wondering how `n.parse().unwrap()` has its type
         | computed, well that part is because type inference is able to
         | look at the definition of `Token::Operand(u32)` and discover
         | that it's `u32`.
         | 
         | From my experience: The compiler can do this, as long as the
         | first usage of the unknown-typed-thing gives it a type. If the
         | first usage of it _doesn 't_, then it won't try any harder to
         | infer the type and it won't compile unless you add your own
         | annotations on.
        
           | VGHN7XDuOXPAzol wrote:
           | Might also be useful for me to link to the docs for `parse`
           | [1] and to the trait `FromStr` [2] that it relies on:
           | 
           | [1]: https://doc.rust-
           | lang.org/std/primitive.str.html#method.pars... [2]:
           | https://doc.rust-lang.org/std/str/trait.FromStr.html
        
           | tialaramex wrote:
           | That's an i32 not a u32 - the operands are allowed to be say
           | -1234 not only positive numbers apparently.
        
         | felixnavid wrote:
         | `n` is the same type as `s` from "match s" and 'n' is just `s`
         | but renamed, if none of the previous conditions passed.
         | 
         | Because `match <exp>` could have contained an expression, you
         | might need to handle a "catch all" case where you can refer to
         | the result of that expression.
         | 
         | The code could have been `match s.doSomething() { ...`. The
         | lines above what you have quoted just compare the result to a
         | couple of a constants. If none are true, the line that you have
         | quoted is equivalent to renaming the result of that expression
         | to `n` and then handling that case.
        
         | sondr3 wrote:
         | If you've never been exposed to a Hindley-Milner type system[1]
         | it can seem a bit magical, but it essentially works by trying
         | to figure out the types from the inside and out by inferring
         | usage all the way to the top. The type of `n` however is
         | `&str`, but I take it you mean the matching. `n.parse()` can be
         | anything that implements `FromStr`, but `Token::Operand` can
         | only take a `u32`, so it can immediately infer that the result
         | of `n.parse().unwrap()` must be `u32` (`n.parse()` is a
         | `Result<u32, Err>`).
         | 
         | [1]:
         | https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_sy...
        
       | conaclos wrote:
       | I am a bit surprised that the author didn't try to implement a
       | stream parser. This could avoid loading the entire file in memory
       | or relying on OS features like memory-mapped files.
        
         | dominicrose wrote:
         | A math expression is basically a tree but represented here as a
         | string in a way that's probably impossible to stream.
        
       | zokier wrote:
       | > We're paying a cost for each split_whitespace call, which
       | allocates intermediate slices.
       | 
       | This part seems bit confused, I don't think `split_whitespace`
       | does any allocations. I wish there were few intermediary steps
       | here, e.g. going from &str and split_whitespace to &[u8] and
       | split.
       | 
       | The tokenizer at that point is bit clunky, it is not really
       | comparable to split_whitespace. The new tokenizer doesn't
       | actually have any whitespace handling, it just assumes that every
       | token is followed by exactly one whitespace. That alone might
       | explain some of the speedup.
        
         | tialaramex wrote:
         | > I don't think `split_whitespace` does any allocations.
         | 
         | Correct. Here's the implementation of split_whitespace
         | pub fn split_whitespace(&self) -> SplitWhitespace<'_> {
         | SplitWhitespace { inner:
         | self.split(IsWhitespace).filter(IsNotEmpty) }         }
         | 
         | So, we're just calling split(IsWhitespace).filter(IsNotEmpty)
         | and keeping the resulting iterator.
         | 
         | Rust's iterators are lazy, they only do work when asked for the
         | next item, so their internal state is only what is necessary to
         | keep doing that each time.
         | 
         | IsWhitespace and IsNotEmpty are both predicates which do
         | exactly what you think they do, they're provided in the library
         | because they might not get inlined and if they don't we might
         | as well only implement them exactly once.
        
           | brians wrote:
           | Can you help me understand what's happening between the split
           | and the filter on "a <space> <space> <space> b"? I expect
           | that to be a series of calls to split, each yielding an empty
           | slice. So the whole iterator yields a slice pointing at a,
           | then a slice pointing at b--but it's had to handle three
           | intermediate slices to get the b. Right?
        
             | LegionMammal978 wrote:
             | It creates a Split<'_> iterator using the IsWhitespace
             | function as the pattern. As the user calls .next() on the
             | outer SplitWhitespace<'_>, it calls .next() on the inner
             | Split<'_>, which yields slices "a", "", "", and "b", and
             | the filtered iterator reduces them to "a" and "b".
             | 
             | (But as mentioned, this doesn't perform any allocations,
             | since each slice is just a pointer + length into the
             | original string.)
        
         | epage wrote:
         | Likely the reason `split_whitespace` is so slow is
         | 
         | > 'Whitespace' is defined according to the terms of the Unicode
         | Derived Core Property White_Space.
         | 
         | If they used `split_ascii_whitespace` things would likely be
         | faster.
         | 
         | Switching parsing from `&str` to `&[u8]` can offer other
         | benefits. In their case, they do `&str` comparisons and are
         | switching that to a `u8` comparison. A lot of other parsers are
         | doing `char` comparisons which requires decoding a `&str` to a
         | `char` which can be expensive and is usually not needed because
         | most grammars can be parsed as `&[u8]` just fine.
        
         | ethan_smith wrote:
         | You're right - split_whitespace returns an iterator that yields
         | string slices (&str) which are just views into the original
         | string without allocation, though the performance difference
         | likely comes from avoiding the iterator indirection and
         | boundary checks.
        
       | nurettin wrote:
       | I have observed that "fearless concurrency" didn't really do much
       | in this case compared to basic practices like not allocating in
       | tight loops.
        
       | pornel wrote:
       | I don't think memory mapping does anything to prevent false
       | sharing. All threads still get the same data at the same address.
       | You may get page alignment for the file, but the free-form data
       | in the file still crosses page boundaries and cache lines.
       | 
       | Also you don't get contention when you don't write to the memory.
       | 
       | The speedup may be from just starting the work before the whole
       | file is loaded, allowing the OS to prefetch the rest in parallel.
       | 
       | You probably would get the same result if you loaded the file in
       | smaller chunks.
        
       | librasteve wrote:
       | thought it would be fun to write this in raku
       | grammar Arithmetic {         rule TOP       { ^ <expr>       $ }
       | rule expr      { <term>+ % ['+' | '-'] }         rule term      {
       | <value> }         rule value     { <number> | <parens> }
       | rule number    { \d+ }         rule parens    { '(' <expr> ')' }
       | }
        
       | combinator_y wrote:
       | I am wondering if there is a different approach that 'peaks'
       | better in terms of perf, like instead of doing : - Optimization
       | 1: Do not allocate a Vector when tokenizing - Optimization 2:
       | Zero allocations -- parse directly from the input bytes -
       | Optimization 3: Do not use Peekable - Optimization 4:
       | Multithreading and SIMD - Optimization 5: Memory-mapped I/O
       | 
       | Example : - Optimization 1: Memory-mapped I/O - Optimization 2:
       | Do not use Peekable - Optimization 3: Do not allocate a Vector
       | when tokenizing - Optimization 4: Zero allocations -- parse
       | directly from the input bytes Conclusion - Optimization 5:
       | Multithreading and SIMD
       | 
       | I might be guessing, but in this order probably by Optimization 3
       | you would reach already a high throughput that you wouldn't
       | bother with manual simd nor Multithreading. (this is a pragmatic
       | way, in real life you will try to minimize risk and try to reach
       | goal as fast as possible, simd/Multithreading carry a lot of risk
       | for your average dev team)
        
         | deathlock wrote:
         | > I might be guessing, but in this order probably by
         | Optimization 3 you would reach already a high throughput that
         | you wouldn't bother with manual simd nor Multithreading.
         | 
         | I agree with you though from my experience Memory Mapping is
         | only useful if you need to jump through the file or read it
         | multiple times (as is the case after the author added simd and
         | a two pass step, the first to identify whitespaces and the
         | second to parse the operation and the operants). If you just
         | need to read the file once it's better to avoid memory mapping
         | as it adds a little overhead.
         | 
         | On the other hand parsing directly from the input bytes
         | avoiding the UTF-8 validation needed to have &str type is easy
         | enough to do but still improves performance quite a bit. Even
         | the rust csv crate, which does much more, is around 30% faster
         | with this optimization.
         | https://docs.rs/csv/latest/csv/tutorial/index.html#amortizin...
         | 
         | This is to say, my list for "easy optimizations, big gains",
         | would be 1) Do not allocate a Vector -- 2) Do not use peekable
         | -- 3) Avoid utf8 validation. I'm still guessing, but I think
         | memory mapping can be skipped, and might be worth it only if
         | you plan on also implementing simd.
        
       | IshKebab wrote:
       | Every time I see people use flamegraphs it's the ancient Perl
       | version. There's a much better version!!!
       | 
       | Use the Go version of pprof: https://github.com/google/pprof
       | 
       | Run it like `pprof -http : your_profile.out` and it will open a
       | browser with a really nice interactive flamegraph (way better
       | than the Perl version), plus a call graph, source line profiling,
       | top functions, etc. etc.
       | 
       | It's so much better. Don't use the Perl version. I should
       | probably write a post showing how to do this.
       | 
       | Another also-much-better alternative is Samply
       | (https://github.com/mstange/samply) which uses the Firefox
       | Profiler as a GUI. I don't like it quite as much as pprof but
       | it's clearly still much better than what's in this article:
       | 
       | https://share.firefox.dev/3j3PJoK
        
         | erk__ wrote:
         | It should be noted that even though the post links to the perl
         | version for some reason, it is actually not what cargo
         | flamegraph [0] uses, it uses a reimplementation of it in Rust
         | called inferno [1].
         | 
         | [0]: https://github.com/flamegraph-rs/flamegraph
         | 
         | [1]: https://github.com/jonhoo/inferno
        
           | IshKebab wrote:
           | Ah interesting. It seems to be no better than the Perl one
           | though, except not requiring Perl. It's still a static SVG.
        
             | porphyra wrote:
             | It's not a static SVG though. The SVG supports
             | interactivity. You can click on each element to zoom in and
             | it even has a search feature.
        
               | IshKebab wrote:
               | Ah really? Their example here doesn't do that: https://gi
               | thub.com/jonhoo/inferno/blob/main/tests/data/flame...
               | 
               | But even so, pprof's is better. (You'll have to try it or
               | take my word for it; they don't seem to have a demo
               | anywhere unfortunately.)
               | 
               | When you hover a function it highlights all the other
               | calls to that function (in different stacks), and if you
               | click it it shows all the calls to and from that function
               | in all stacks with two-sided flame graph.
        
               | porphyra wrote:
               | The one at https://github.com/flamegraph-
               | rs/flamegraph/blob/main/exampl... supports both zooming
               | and search.
        
               | IshKebab wrote:
               | Ah I see. Yeah pprof is significantly superior.
        
         | the-wumpus wrote:
         | pprof doesn't do an amazing job of explaining how to use it
         | with perf (which you'd need to use for a rust project like OP),
         | so:
         | 
         | First install perf, graphviz, perf_data_converter and ofc
         | pprof, then generate the data with `perf record [command]`, and
         | display it with `pprof -http=: perf.data`.
        
           | IshKebab wrote:
           | I typically use gperftools for profiling instead of perf. You
           | can LD_PRELOAD it.
        
         | remram wrote:
         | That repo has no builds and no releases, kind of surprising?
         | And needs another tool to consume perf data?
         | 
         | edit: And I can only build it using bazel, and I need bazel to
         | build bazel? I think I'll stick with Perl...
        
           | IshKebab wrote:
           | I guess you didn't get very far in the README because near
           | the top it tells you how to install it. It's a single
           | command:                 go install
           | github.com/google/pprof@latest
        
             | jlarocco wrote:
             | Not everybody wants to install Go just to get an
             | application.
        
             | remram wrote:
             | I got very far in the README, bazel is required for the
             | other repo (perf_data_converter).
        
         | jdreaver wrote:
         | I recently discovered that `perf` itself can spit out
         | flamegraphs. My workflow has been:                   $ perf
         | record -g -F 99 ./my-program         $ perf script report
         | flamegraph
         | 
         | You can also run `perf script -F +pid > out.perf` and then open
         | `out.perf` in Firefox's built-in profile viewer (which is super
         | neat) https://profiler.firefox.com
        
       | taeric wrote:
       | I'm somewhat curious on if these optimizations would all have
       | roughly the same impact if done in other orders? The presentation
       | certainly makes it look like creating a big list of tokens is
       | always the culprit here. Seems somewhat expected, so I agree with
       | the text; but I still wonder if the other optimizations are best
       | to look at in terms of percentage gains or absolute gains, here.
       | 
       | Neat write up! Kudos on that.
        
       | noelwelsh wrote:
       | I feel like I'm taking crazy pills. It's not a parser, but a
       | fused parser AND interpreter. This changes the game considerably!
       | It doesn't have to produce an intermediate AST, and therefore can
       | avoid the majority of the allocation that most parsers will
       | perform.
       | 
       | However, avoiding creating the AST is not very realistic for most
       | uses. It's usually needed to perform optimizations, or even just
       | for more complicated languages that have interesting control-
       | flow.
        
       ___________________________________________________________________
       (page generated 2025-07-10 23:01 UTC)