[HN Gopher] Rewriting the Lexer Benchmark in Rust
___________________________________________________________________
Rewriting the Lexer Benchmark in Rust
Author : ibobev
Score : 24 points
Date : 2022-05-30 14:08 UTC (8 hours ago)
(HTM) web link (eli.thegreenplace.net)
(TXT) w3m dump (eli.thegreenplace.net)
| vlmutolo wrote:
| I'd like to see an "owned" version where the tokens hold
| something like a CompactString [0] instead of a &str or String,
| where CompactString is just some type that uses the small-string
| optimization and usually avoids heap allocation. This could
| result in a lifetime-free API with probably only sight
| performance overhead compared to the &str version.
|
| It would also be interesting to see how smol_str [1] stacks up,
| since it was built with tokens in mind. Though I'm not sure how
| helpful it would be in this specific case; one of its primary
| advantages seems to be that it stores whitespace compactly, and I
| don't think the author of the article is preserving whitespace.
|
| [0]: https://docs.rs/compact_str
|
| [1]: https://docs.rs/smol_str
| Measter wrote:
| Another alternative would be the use of a string interner, with
| the tokens storing the interner ID.
|
| Advantages would be that the token type can stay small and
| Copy, while not having a lifetime to carry around.
|
| Disadvantages would be the overhead of the interning, which
| would slow down lexing, and you'd need to drag around the
| interner to anywhere you need the actual string.
| vlmutolo wrote:
| I wonder if string interning is advantageous for a lexer,
| where you know the strings you'd want to intern ahead-of-time
| (AoT). If you have reserved words, those will probably end up
| as enum variants in "Token". And things you can't know AoT
| are less likely to be amenable to internment, like comments
| and string literals. Tokenization already includes something
| like a manual string internment process.
|
| Originally I had the same thought as you, and that's why I
| hunted down the smol_str library. I knew that Aleksey used it
| in his rust-analyzer parser and figured it was an interner.
|
| But then I saw it wasn't (other than for whitespace kinda)
| and started to wonder string interning fit this problem.
___________________________________________________________________
(page generated 2022-05-30 23:02 UTC)