[HN Gopher] Optimizing a Math Expression Parser in Rust
___________________________________________________________________
Optimizing a Math Expression Parser in Rust
Author : serial_dev
Score : 121 points
Date : 2025-07-10 09:27 UTC (13 hours ago)
(HTM) web link (rpallas.xyz)
(TXT) w3m dump (rpallas.xyz)
| tialaramex wrote:
| This reminds me I should actually write a "natural" arithmetic
| expression parser for my Rust crate realistic
|
| Right now, realistic can parse "(* (^ 40 0.5) (^ 90 0.5))" and it
| will tell you that's 60, because yeah, it's sixty, that's how
| real arithmetic works.
|
| But it would be nice to write "(40^0.5) * (90^0.5)" or similar
| and have that work instead or as well. The months of work on
| realistic meant I spent so long without a "natural" parser that I
| got used to this.
| thrance wrote:
| I think having a polish notation parser is good enough for
| math-y applciations, I wouldn't worry about it too much if I
| were you. Nice crate by the way!
| ramon156 wrote:
| This would be a perfect class project. First lesson is letting
| them go loose, second lessen would be to find out which
| optimizations they used and which are available
| makapuf wrote:
| I very much like those optimizations articles , what could be
| interesting is to look at benchmarks not only wall time but also
| other metrics :
|
| - cpu time (better CPU usage can mean shorter wall time but
| higher CPU)
|
| - memory usage
|
| - but also and maybe more interestingly complexity of code (not
| an absolute metric, but a very complex/non portable code for 5%
| speedup may or may not be worth it)
|
| EDIT: formatting
| catfacts wrote:
| I am not even a newbye in Rust and also this could be just
| nitpicking, but it seems that match is comparing strings and not
| characters, if this is the case then I think Common Lisp can
| optimize more, since there is a special comparison for characters
| in CL.
|
| Edited: In the optimized version the author use bytes and
| generators and avoid using strings. I don't know if Rust
| generators are optimized for speed or memory, ideally you could
| define the length of the buffer according to the memory cache
| available.
|
| Edited: I find strange using input = read_input_file()? and then
| using eval(&input), what happens when there is an error reading
| the file? Rust is supposed to be a high security language. In CL
| there are keyword like if-does-not-exists to decide what to do
| and also read accepts additional parameters for end-of-file and
| for expressing that this read is inside a recursive procedure
| inside another read.
|
| I should stop comparing Rust to CL, better learn Rust first. I
| consider this kind of articles a very good way of learning Rust
| for those interested in parsing and optimizations. Rust seems to
| be a very nice language when you can afford the time to develop
| your program.
| VGHN7XDuOXPAzol wrote:
| If you wanted to match on characters (`char`s) then you could
| do this with single quotes (`'+'`)
|
| Or if you wanted to do it on bytes, you could also do this,
| with (`b'+'`).
|
| Unsure if that would provide a meaningful boost or not
| catfacts wrote:
| Thanks for all the information you provided. I will read Rust
| by Example and stop posting in this thread to avoid deviating
| from the OP. Anyway, perhaps other readers are learning Rust
| and having the same questions in their minds, so your answers
| are also welcome for them.
|
| Edited: I will eliminate my catfacts username (changing
| passsord to a random one), I don't like being downvoted and I
| know I should not mention it, but things are what they are.
| Good bye catfacts !.
| epage wrote:
| Likely, comparing on `char` ('+') would be slower as it
| requires decoding the `&str` as a `char` which comes with
| some significant overhead (I've seen 9% on a fairly optimized
| parser). Ideally, when you grammar is 7-bit ASCII (or any
| 8-bit UTF-8 values are opaque to your grammar), you instead
| parse on `&[u8]` and do `u8` comparisons, rather than `char`
| or `&[u8]`.
| VGHN7XDuOXPAzol wrote:
| > what happens when there is an error reading the file?
|
| the question mark `?` denotes the fact that the error is
| bubbled up (kind of like an exception, but with stronger typing
| and less silent)
| catfacts wrote:
| Thanks for the info. I imagine that in this care, since it
| seems the error is not captured, it should end producing
| panic. So a question mark is used when the expected result is
| of type Result or Error. Also the web page, https://doc.rust-
| lang.org/rust-by-example/error/result.html, describe the
| result type as Ok(T) or Err(E), and indicates that is a
| richer version of Option.
| VGHN7XDuOXPAzol wrote:
| Yeah, if `main` returns an error I think it exits with an
| error code and prints it out, so quite similar to a panic.
|
| I think the blog post is not focussing on error handling
| too much, but in any case this is 'safe', just could likely
| be handled better in a real-world case.
| tialaramex wrote:
| Specifically the ? symbol is currently implemented via the
| operator trait Try as Try::branch() which gets you a
| ControlFlow
|
| If Try::branch gives us a ControlFlow::Break we're done here,
| return immediately with the value wrapped by Break [if any]
| inside an Err, otherwise we have ControlFlow::Continue
| wrapping a value we can use to continue with execution of
| this function.
|
| This is type checked, so if the function says it returns
| Result<Goose, Dalek> then the type of the value wrapped in a
| ControlFlow::Break had better be Err(Dalek) or else we can't
| use our ? operator here.
|
| Reifying ControlFlow here separates concerns properly - if we
| want to stop early _successfully_ then control flow can
| represent that idea just fine whereas an Exception model ties
| early exit to failure.
| phi-go wrote:
| The ? will directly return Err if there is one during
| read_input_file(). This is just some syntactic sugar.
| biorach wrote:
| > I should stop comparing Rust to CL, better learn Rust first
|
| Yes
| amelius wrote:
| Can somebody explain this line: n =>
| Token::Operand(n.parse().unwrap()),
|
| How does the compiler derive the type of n?
| tialaramex wrote:
| We're doing a pattern match, so, this variable n has to be
| something that matches the entire value matched, its type will
| be identical to the type of the value matched, s a few lines
| earlier.
|
| That value is an item from the iterator we got from calling
| split_whitespace() and split_whitespace() returns a
| SplitWhiteSpace, a custom iterator whose items are themselves
| sub-strings of the input string with (no surprise) no white
| space in them. In Rust's terminology these are &str, references
| to a string slice.
|
| So, the type is &str
| amelius wrote:
| Aha. But what type does n.parse() have then, and how does the
| compiler derive it?
| ninkendo wrote:
| That function is returning a Vec<Token>, and so it knows
| the .collect() call needs to return a Vec<Token>, and so
| therefore the .map() function needs to return a Token.
| Therefore each match arm needs to return a Token too, so
| therefore the compiler selects the implementation of
| .parse() that returns Token.
|
| I admit when I started rust, seeing calls to .parse() was
| one of the more confusing things I saw in rust code,
| because of how much it leans on type inference to be
| readable. In places like these, it's a bit more readable:
| let ip: IpAddr = ip_str.parse()?;
|
| But when you see the .parse buried several levels deep and
| you have no idea what type it's trying to produce, it's a
| pain in the ass to read. This is why it's nice to use the
| turbo-fish syntax: let ip =
| ip_str.parse::<IpAddr>()?;
|
| Since you can drop .parse::<IpAddr>()? anywhere to make the
| type explicit, especially when buried in type-inferred
| blocks like the code in TFA.
| tialaramex wrote:
| In _this_ case the compiler actually first wants the type
| for a parameter to the function Token::Operand
|
| That function is not shown, but it is included in the full
| source code which was linked. Well, technically we need to
| know that Rust says if there's a sum type Token::Operand
| which has an associated value, we can always call a
| function to make a Token::Operand with that value, and it
| just names this function Token::Operand too.
|
| So, Token::Operand takes an i32, a 32-bit signed integer.
| The compiler knows we're eventually getting an i32 to call
| this function, if not our program isn't valid.
|
| Which means n.parse().unwrap() has the type i32
|
| We know n is an &str, the &str type has a _generic_
| function parse(), with the following signature:
| pub fn parse<F>(&self) -> Result<F, <F as FromStr>::Err>
| where F: FromStr
|
| So the type you now care about, that of n.parse() has to be
| Result of some kind, and we're going to call
| Result::unwrap() on that, to get an i32
|
| This can only work if the type F in that generic signature
| above is i32
|
| Which means the new type you care about was Result<i32,
| ParseIntError> and the parse function called will be the
| one which makes Ok(i32) when presented an in-range integer.
|
| Edited: Word-smithing, no significant change of meaning.
| phi-go wrote:
| Operand(u32) (see definition of Token), n will be parsed as a
| u32. See here: https://doc.rust-
| lang.org/std/primitive.str.html#method.pars...
| VGHN7XDuOXPAzol wrote:
| maybe this isn't the question you meant to ask, but:
|
| `n` has the same type as the input of the `match` block. In
| other words, it's a fallback case. (In this case, it's `&str`;
| the same as `"+"`, `"-"`, etc)
|
| If you're wondering how `n.parse().unwrap()` has its type
| computed, well that part is because type inference is able to
| look at the definition of `Token::Operand(u32)` and discover
| that it's `u32`.
|
| From my experience: The compiler can do this, as long as the
| first usage of the unknown-typed-thing gives it a type. If the
| first usage of it _doesn 't_, then it won't try any harder to
| infer the type and it won't compile unless you add your own
| annotations on.
| VGHN7XDuOXPAzol wrote:
| Might also be useful for me to link to the docs for `parse`
| [1] and to the trait `FromStr` [2] that it relies on:
|
| [1]: https://doc.rust-
| lang.org/std/primitive.str.html#method.pars... [2]:
| https://doc.rust-lang.org/std/str/trait.FromStr.html
| tialaramex wrote:
| That's an i32 not a u32 - the operands are allowed to be say
| -1234 not only positive numbers apparently.
| felixnavid wrote:
| `n` is the same type as `s` from "match s" and 'n' is just `s`
| but renamed, if none of the previous conditions passed.
|
| Because `match <exp>` could have contained an expression, you
| might need to handle a "catch all" case where you can refer to
| the result of that expression.
|
| The code could have been `match s.doSomething() { ...`. The
| lines above what you have quoted just compare the result to a
| couple of a constants. If none are true, the line that you have
| quoted is equivalent to renaming the result of that expression
| to `n` and then handling that case.
| sondr3 wrote:
| If you've never been exposed to a Hindley-Milner type system[1]
| it can seem a bit magical, but it essentially works by trying
| to figure out the types from the inside and out by inferring
| usage all the way to the top. The type of `n` however is
| `&str`, but I take it you mean the matching. `n.parse()` can be
| anything that implements `FromStr`, but `Token::Operand` can
| only take a `u32`, so it can immediately infer that the result
| of `n.parse().unwrap()` must be `u32` (`n.parse()` is a
| `Result<u32, Err>`).
|
| [1]:
| https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_sy...
| conaclos wrote:
| I am a bit surprised that the author didn't try to implement a
| stream parser. This could avoid loading the entire file in memory
| or relying on OS features like memory-mapped files.
| dominicrose wrote:
| A math expression is basically a tree but represented here as a
| string in a way that's probably impossible to stream.
| zokier wrote:
| > We're paying a cost for each split_whitespace call, which
| allocates intermediate slices.
|
| This part seems bit confused, I don't think `split_whitespace`
| does any allocations. I wish there were few intermediary steps
| here, e.g. going from &str and split_whitespace to &[u8] and
| split.
|
| The tokenizer at that point is bit clunky, it is not really
| comparable to split_whitespace. The new tokenizer doesn't
| actually have any whitespace handling, it just assumes that every
| token is followed by exactly one whitespace. That alone might
| explain some of the speedup.
| tialaramex wrote:
| > I don't think `split_whitespace` does any allocations.
|
| Correct. Here's the implementation of split_whitespace
| pub fn split_whitespace(&self) -> SplitWhitespace<'_> {
| SplitWhitespace { inner:
| self.split(IsWhitespace).filter(IsNotEmpty) } }
|
| So, we're just calling split(IsWhitespace).filter(IsNotEmpty)
| and keeping the resulting iterator.
|
| Rust's iterators are lazy, they only do work when asked for the
| next item, so their internal state is only what is necessary to
| keep doing that each time.
|
| IsWhitespace and IsNotEmpty are both predicates which do
| exactly what you think they do, they're provided in the library
| because they might not get inlined and if they don't we might
| as well only implement them exactly once.
| brians wrote:
| Can you help me understand what's happening between the split
| and the filter on "a <space> <space> <space> b"? I expect
| that to be a series of calls to split, each yielding an empty
| slice. So the whole iterator yields a slice pointing at a,
| then a slice pointing at b--but it's had to handle three
| intermediate slices to get the b. Right?
| LegionMammal978 wrote:
| It creates a Split<'_> iterator using the IsWhitespace
| function as the pattern. As the user calls .next() on the
| outer SplitWhitespace<'_>, it calls .next() on the inner
| Split<'_>, which yields slices "a", "", "", and "b", and
| the filtered iterator reduces them to "a" and "b".
|
| (But as mentioned, this doesn't perform any allocations,
| since each slice is just a pointer + length into the
| original string.)
| epage wrote:
| Likely the reason `split_whitespace` is so slow is
|
| > 'Whitespace' is defined according to the terms of the Unicode
| Derived Core Property White_Space.
|
| If they used `split_ascii_whitespace` things would likely be
| faster.
|
| Switching parsing from `&str` to `&[u8]` can offer other
| benefits. In their case, they do `&str` comparisons and are
| switching that to a `u8` comparison. A lot of other parsers are
| doing `char` comparisons which requires decoding a `&str` to a
| `char` which can be expensive and is usually not needed because
| most grammars can be parsed as `&[u8]` just fine.
| ethan_smith wrote:
| You're right - split_whitespace returns an iterator that yields
| string slices (&str) which are just views into the original
| string without allocation, though the performance difference
| likely comes from avoiding the iterator indirection and
| boundary checks.
| nurettin wrote:
| I have observed that "fearless concurrency" didn't really do much
| in this case compared to basic practices like not allocating in
| tight loops.
| pornel wrote:
| I don't think memory mapping does anything to prevent false
| sharing. All threads still get the same data at the same address.
| You may get page alignment for the file, but the free-form data
| in the file still crosses page boundaries and cache lines.
|
| Also you don't get contention when you don't write to the memory.
|
| The speedup may be from just starting the work before the whole
| file is loaded, allowing the OS to prefetch the rest in parallel.
|
| You probably would get the same result if you loaded the file in
| smaller chunks.
| librasteve wrote:
| thought it would be fun to write this in raku
| grammar Arithmetic { rule TOP { ^ <expr> $ }
| rule expr { <term>+ % ['+' | '-'] } rule term {
| <value> } rule value { <number> | <parens> }
| rule number { \d+ } rule parens { '(' <expr> ')' }
| }
| combinator_y wrote:
| I am wondering if there is a different approach that 'peaks'
| better in terms of perf, like instead of doing : - Optimization
| 1: Do not allocate a Vector when tokenizing - Optimization 2:
| Zero allocations -- parse directly from the input bytes -
| Optimization 3: Do not use Peekable - Optimization 4:
| Multithreading and SIMD - Optimization 5: Memory-mapped I/O
|
| Example : - Optimization 1: Memory-mapped I/O - Optimization 2:
| Do not use Peekable - Optimization 3: Do not allocate a Vector
| when tokenizing - Optimization 4: Zero allocations -- parse
| directly from the input bytes Conclusion - Optimization 5:
| Multithreading and SIMD
|
| I might be guessing, but in this order probably by Optimization 3
| you would reach already a high throughput that you wouldn't
| bother with manual simd nor Multithreading. (this is a pragmatic
| way, in real life you will try to minimize risk and try to reach
| goal as fast as possible, simd/Multithreading carry a lot of risk
| for your average dev team)
| deathlock wrote:
| > I might be guessing, but in this order probably by
| Optimization 3 you would reach already a high throughput that
| you wouldn't bother with manual simd nor Multithreading.
|
| I agree with you though from my experience Memory Mapping is
| only useful if you need to jump through the file or read it
| multiple times (as is the case after the author added simd and
| a two pass step, the first to identify whitespaces and the
| second to parse the operation and the operants). If you just
| need to read the file once it's better to avoid memory mapping
| as it adds a little overhead.
|
| On the other hand parsing directly from the input bytes
| avoiding the UTF-8 validation needed to have &str type is easy
| enough to do but still improves performance quite a bit. Even
| the rust csv crate, which does much more, is around 30% faster
| with this optimization.
| https://docs.rs/csv/latest/csv/tutorial/index.html#amortizin...
|
| This is to say, my list for "easy optimizations, big gains",
| would be 1) Do not allocate a Vector -- 2) Do not use peekable
| -- 3) Avoid utf8 validation. I'm still guessing, but I think
| memory mapping can be skipped, and might be worth it only if
| you plan on also implementing simd.
| IshKebab wrote:
| Every time I see people use flamegraphs it's the ancient Perl
| version. There's a much better version!!!
|
| Use the Go version of pprof: https://github.com/google/pprof
|
| Run it like `pprof -http : your_profile.out` and it will open a
| browser with a really nice interactive flamegraph (way better
| than the Perl version), plus a call graph, source line profiling,
| top functions, etc. etc.
|
| It's so much better. Don't use the Perl version. I should
| probably write a post showing how to do this.
|
| Another also-much-better alternative is Samply
| (https://github.com/mstange/samply) which uses the Firefox
| Profiler as a GUI. I don't like it quite as much as pprof but
| it's clearly still much better than what's in this article:
|
| https://share.firefox.dev/3j3PJoK
| erk__ wrote:
| It should be noted that even though the post links to the perl
| version for some reason, it is actually not what cargo
| flamegraph [0] uses, it uses a reimplementation of it in Rust
| called inferno [1].
|
| [0]: https://github.com/flamegraph-rs/flamegraph
|
| [1]: https://github.com/jonhoo/inferno
| IshKebab wrote:
| Ah interesting. It seems to be no better than the Perl one
| though, except not requiring Perl. It's still a static SVG.
| porphyra wrote:
| It's not a static SVG though. The SVG supports
| interactivity. You can click on each element to zoom in and
| it even has a search feature.
| IshKebab wrote:
| Ah really? Their example here doesn't do that: https://gi
| thub.com/jonhoo/inferno/blob/main/tests/data/flame...
|
| But even so, pprof's is better. (You'll have to try it or
| take my word for it; they don't seem to have a demo
| anywhere unfortunately.)
|
| When you hover a function it highlights all the other
| calls to that function (in different stacks), and if you
| click it it shows all the calls to and from that function
| in all stacks with two-sided flame graph.
| porphyra wrote:
| The one at https://github.com/flamegraph-
| rs/flamegraph/blob/main/exampl... supports both zooming
| and search.
| IshKebab wrote:
| Ah I see. Yeah pprof is significantly superior.
| the-wumpus wrote:
| pprof doesn't do an amazing job of explaining how to use it
| with perf (which you'd need to use for a rust project like OP),
| so:
|
| First install perf, graphviz, perf_data_converter and ofc
| pprof, then generate the data with `perf record [command]`, and
| display it with `pprof -http=: perf.data`.
| IshKebab wrote:
| I typically use gperftools for profiling instead of perf. You
| can LD_PRELOAD it.
| remram wrote:
| That repo has no builds and no releases, kind of surprising?
| And needs another tool to consume perf data?
|
| edit: And I can only build it using bazel, and I need bazel to
| build bazel? I think I'll stick with Perl...
| IshKebab wrote:
| I guess you didn't get very far in the README because near
| the top it tells you how to install it. It's a single
| command: go install
| github.com/google/pprof@latest
| jlarocco wrote:
| Not everybody wants to install Go just to get an
| application.
| remram wrote:
| I got very far in the README, bazel is required for the
| other repo (perf_data_converter).
| jdreaver wrote:
| I recently discovered that `perf` itself can spit out
| flamegraphs. My workflow has been: $ perf
| record -g -F 99 ./my-program $ perf script report
| flamegraph
|
| You can also run `perf script -F +pid > out.perf` and then open
| `out.perf` in Firefox's built-in profile viewer (which is super
| neat) https://profiler.firefox.com
| taeric wrote:
| I'm somewhat curious on if these optimizations would all have
| roughly the same impact if done in other orders? The presentation
| certainly makes it look like creating a big list of tokens is
| always the culprit here. Seems somewhat expected, so I agree with
| the text; but I still wonder if the other optimizations are best
| to look at in terms of percentage gains or absolute gains, here.
|
| Neat write up! Kudos on that.
| noelwelsh wrote:
| I feel like I'm taking crazy pills. It's not a parser, but a
| fused parser AND interpreter. This changes the game considerably!
| It doesn't have to produce an intermediate AST, and therefore can
| avoid the majority of the allocation that most parsers will
| perform.
|
| However, avoiding creating the AST is not very realistic for most
| uses. It's usually needed to perform optimizations, or even just
| for more complicated languages that have interesting control-
| flow.
___________________________________________________________________
(page generated 2025-07-10 23:01 UTC)