[HN Gopher] We rewrote our Rust WASM parser in TypeScript and it...
___________________________________________________________________
We rewrote our Rust WASM parser in TypeScript and it got faster
Author : zahlekhan
Score : 279 points
Date : 2026-03-20 21:48 UTC (1 days ago)
(HTM) web link (www.openui.com)
(TXT) w3m dump (www.openui.com)
| blundergoat wrote:
| The real win here isn't TS over Rust, it's the O(N2) -> O(N)
| streaming fix via statement-level caching. That's a 3.3x
| improvement on its own, independent of language choice. The WASM
| boundary elimination is 2-4x, but the algorithmic fix is what
| actually matters for user-perceived latency during streaming.
| Title undersells the more interesting engineering imo.
| shmerl wrote:
| More like a misleading clickbait.
| sroussey wrote:
| Yeah, though the n^2 is overstating things.
|
| One thing I noticed was that they time each call and then use a
| median. Sigh. In a browser. :/ With timing attack defenses
| build into the JS engine.
| fn-mote wrote:
| For those of us not in the know, what are we expecting the
| results of the defenses to be here?
| sroussey wrote:
| Jitter. It make precise timings unreliable. Time the entire
| time of 1000 runs and divide by 1000 instead of starting
| and stopping 1000 timers.
| Aurornis wrote:
| > Title undersells the more interesting engineering imo.
|
| Thanks for cutting through the clickbait. The post is
| interesting, but I'm so tired of being unnecessarily
| clickbaited into reading articles.
| socalgal2 wrote:
| same for uv but no one takes that message. They just think
| "rust rulez!" and ignore that all of uv's benefits are algo,
| not lang.
| estebank wrote:
| Some architectures are made easier by the choice of
| implementation language.
| crubier wrote:
| In my experience Rust typically makes it a little bit
| harder to write the most efficient algo actually.
| catlifeonmars wrote:
| That's usually ok bc in most code your N is small and
| compiler optimizations dominate.
| Defletter wrote:
| Would you be willing to give an example of this?
| lukeweston1234 wrote:
| Not OP, but one example where it is a bit harder to do
| something in Rust that in C, C++, Zig, etc. is mutability
| on disjoint slices of an array. Rust offers a few
| utilities, like chunks_by, split_at, etc. but for certain
| data structures and algorithms it can be a bit annoying.
|
| It's also worth noting that unsafe Rust != C, and you are
| still battling these rules. With enough experience you
| gain an understanding of these patterns and it goes away,
| and you also have these realy solid tools like Miri for
| finding undefined behavior, but it can be a bit of a
| hastle.
| catlifeonmars wrote:
| Has no one written a python! macro for this use case?
| foldr wrote:
| Mutating tree structures tends to be a fiddle (especially
| if you want parent pointers).
| EdwardDiego wrote:
| UV also has the distinct advantage in dependency resolution
| that it didn't have to implement the backwards compatible
| stuff Pip does, I think Astral blogged on it. If I can find
| it, I'll edit the link in.
|
| _edit_ wasn 't Astral, but here's the blog post I was
| thinking of. https://nesbitt.io/2025/12/26/how-uv-got-so-
| fast.html
|
| That said, your point is very much correct, if you watch or
| read the Jane Street tech talk Astral gave, you can see how
| they really leveraged Rust for performance like turning
| Python version identifiers into u64s.
| rowanG077 wrote:
| That's a pretty big claim. I don't doubt that a lot of uv's
| benefits are algo. But everything? Considering that running
| non IO-bound native code should be an order of magnitude
| faster than python.
| thfuran wrote:
| More than one, I'd think.
| jeremyjh wrote:
| Its a pretty well-supported claim. uv skips doing a number
| of things that generate file I/O. File I/O is far more
| costly than the difference in raw computation. pip can't
| drop those for compatibility reasons.
|
| https://nesbitt.io/2025/12/26/how-uv-got-so-fast.html
| rowanG077 wrote:
| I don't think the article you linked supports the claim
| that none of UV performance improvements are related to
| using rust over python at all. In fact it directly states
| the exact opposite. They have an entire section dedicated
| to why using Rust has direct performance advantages for
| UV.
| jeremyjh wrote:
| What it says is this:
|
| > uv is fast because of what it doesn't do, not because
| of what language it's written in. The standards work of
| PEP 518, 517, 621, and 658 made fast package management
| possible. Dropping eggs, pip.conf, and permissive parsing
| made it achievable. Rust makes it a bit faster still.
| rowanG077 wrote:
| Yes exactly! That quote directly disproves that all of
| the improvements UV has over competitors is because of
| algos, not because of rust.
|
| So the claim is not well supported at all by the article
| as you stated, in fact the claim is literally disproven
| by the article.
| jeremyjh wrote:
| You are right. 99% is not 100%.
| rowanG077 wrote:
| I don't think the article has substantive numbers. You'd
| have to re-implement UV in python to do that. I don't
| think anyone did that. It would be interesting at least
| to see how much UV spends in syscalls vs PIP and make a
| relative estimate based on that.
| kyralis wrote:
| This is either an overly pedantic take or a disingenuous
| one. The very first line that the parent quoted is
|
| > uv is fast because of what it doesn't do, not because
| of what language it's written in.
|
| The fact that the language had a small effect ("a bit")
| does not invalidate the statement that algorithmic
| improvements are the reason for the relative speed. In
| fact, there's no reason to believe that rust without the
| algorithmic version would be notably faster at all. Sure,
| "all" is an exaggeration, but the point made still stands
| in the form that most readers would understand it:
| algorithmic improvements are the important difference
| between the systems.
| rowanG077 wrote:
| I think we might be talking past each other a bit.
|
| The specific claim I was responding to was that all of
| uv's performance improvements come from algorithms rather
| than the language. My point was just that this is a
| stronger claim than what the article supports, the
| article itself says Rust contributes "a bit" to the
| speed, so it's not purely algorithmic.
|
| I do agree with the broader point that algorithmic and
| architectural choices are the main reason uv is fast, and
| I tried to acknowledge that, apparently unsuccessfully,
| in my very my first comment ("I don't doubt that a lot of
| uv's benefits are algo. But everything?").
| ambicapter wrote:
| You are being very pedantic here.
| staticassertion wrote:
| Do you actually believe that UV would be as fast if it
| were written in Python?
| tinco wrote:
| It would come pretty close, probably close enough that
| you wouldn't be able to tell the difference on 90% of
| projects.
| staticassertion wrote:
| Vague. What's pretty close? I mean, even for IO bound
| tasks you can pretty quickly validate that the
| performance between languages is not close at all - 10 to
| 100x difference.
| tinco wrote:
| Sure, within 100ms. Who cares what the performance
| multiples are?
| staticassertion wrote:
| That literally makes no sense. 100ms... out of what? Is
| it 1ms vs 100ms? 100000ms vs 100100ms?
|
| Anyway, dubious claim since a Python interpreter will
| take 10s of milliseconds just to print out its version.
|
| Do you have any evidence? I can point at techempower
| benchmarks showing IO bound tasks are still 10-100x
| faster in native languages vs Python/JS.
| tinco wrote:
| I'm saying that the Rust might execute in 50ms and the
| Python in 150ms. You are the one not making sense, we are
| talking about application performance, why are you _not_
| measuring that in milliseconds.
|
| That is assuming Rust is 100x faster than Python btw,
| 49ms of I/O, 1ms of Rust, 100ms of Python.
| staticassertion wrote:
| > I'm saying that the Rust might execute in 50ms and the
| Python in 150ms.
|
| Okay, so the Rust code would be 3x as fast. Feels
| arbitrary, but sure.
|
| > You are the one not making sense, we are talking about
| application performance, why are you not measuring that
| in milliseconds.
|
| I explained why your post made no sense already...
|
| > That is assuming Rust is 100x faster than Python btw,
| 49ms of I/O, 1ms of Rust, 100ms of Python.
|
| That's not how anything works. Different languages will
| perform differently on IO work, different runtimes will
| degrade under IO differently, etc. That's why even basic
| echo HTTP servers perform radically differently in Python
| vs Rust.
|
| This isn't how computers work and it's not even how math
| works.
|
| This conversation has become nonsensical. The thing we
| can agree with is this - no, uv would not be as fast if
| it were written in Python.
| jeremyjh wrote:
| > Different languages will perform differently on IO
| work,
|
| IO is executed by kernel, file system or network drivers.
| IO performance is not dependent at all on which language
| makes the syscalls.
|
| > The thing we can agree with is this - no, uv would not
| be as fast if it were written in Python.
|
| In this thread, we are talking about the speed of uv in
| terms of user experience - how long a person waits for
| command line operations to complete. Things that pip
| takes multiple seconds to do, uv will do in dozens of
| milliseconds. If uv were written in python, it would take
| dozens of ms + a few dozens more, which means absolutely
| fuck all nothing in the context of the thousands of
| milliseconds saved over pip.
|
| Its possible a user might perceive a slight difference in
| larger projects, but if pip had been uv-but-in-python,
| the uv-in-rust project would never have been started in
| the first place because no one would have bothered
| switching.
|
| > This conversation has become nonsensical.
|
| Agreed. No one in this thread is disputing that Rust code
| is faster than Python, only that in this case it is
| completely insignificant in the face of all the useless
| file and network I/O that pip is doing, and uv is not.
| tinco wrote:
| > That's not how anything works. Different languages will
| perform differently on IO work, different runtimes will
| degrade under IO differently, etc. That's why even basic
| echo HTTP servers perform radically differently in Python
| vs Rust.
|
| > This isn't how computers work and it's not even how
| math works.
|
| What are you disagreeing with? There's some baseline
| amount of I/O that the kernel does for you, that's what
| I'm assuming is 50ms, and everything else like runtime
| degrading is overhead due to the language/platform
| choice. I'm saying Rust is upwards of 100x faster in that
| regard thanks to its zero cost abstraction philosophy.
| You can't just include the I/O baseline in a claim about
| Rust's performance advantage. You'll be really
| disappointed when Rust doesn't download your files 100x
| as fast as the Python file downloader.
|
| Anyway, I'm sorry I provoked your antagonism with my
| terse messages, I wasn't trying to be blase. I believe uv
| is the sort of tool that wouldn't suffer much from the
| downsides of Python and that in most situations the
| reduced runtime overhead of Rust would have a negligible
| impact on the user experience. I'm not arguing that they
| shouldn't build uv in Rust. Most situations is not all
| situations, and when a tool is used so widely you'll hit
| all edge cases, from the point where the 10s of
| milliseconds of startup time matters to the point where
| Pythons I/O overhead matters at scale.
| coldtea wrote:
| Just the fact that I can install a single binary is 10x
| better than an equally fast Python implementation.
| azakai wrote:
| O(N2) -> O(N) was 3.3x faster, but before that, eliminating the
| boundary (replacing wasm with JS) led to speedups of 2.2x,
| 4.6x, 3.0x (see one table back).
|
| It looks like neither is the "real win". both the language and
| the algorithm made a big difference, as you can see in the
| first column in the last table - going to wasm was a big
| speedup, and improving the algorithm on top of that was another
| big speedup.
| nulltrace wrote:
| Yeah the algorithmic fix is doing most of the work here. But
| call that parser hundreds of times on tiny streaming chunks and
| the WASM boundary cost per call adds up fast. Same thing would
| happen with C++ compiled to WASM.
| hrmtst93837 wrote:
| WASM boundary overhead is only half the story. Once you start
| bouncing tiny chunks across JS and WASM over and over, the
| data shuffling and memory layout mismatch can trash cache
| behavior, pile on allocation churn, and turn a nice benchmark
| into something that looks nothing like a parser living inside
| a streaming pipeline. That's why most 'language duel' posts
| feel beside the point.
| catlifeonmars wrote:
| You're not wrong, but that win would not get as many views.
| It's not clickbaity enough
| adastra22 wrote:
| No AI generated comments on HN please.
| wolvesechoes wrote:
| > The real win here isn't TS over Rust
|
| Kinda is. We came up with abstractions to help reason about
| what really matters. The more you need to deal with auxillary
| stuff (allocations, lifetimes), more likely you will miss the
| big issue.
| coldtea wrote:
| The opposite: the more you rely on abstractions the more you
| miss the lower level optimization opportunities and loose
| understanding of algorithms and hardware.
| wolvesechoes wrote:
| > of algorithms
|
| Yes, sprinkling your code logic with malloc, .clone() or
| lifetime annotations on the other hand brings algorithmic
| enlightenment.
| coldtea wrote:
| Dealing and having to think about the cost of malloc,
| clone() and lifetimes, brings algorithmic enlightenment
| more than working on an high abstraction ivory tower
| where things "magically happen".
|
| Is your argument that the average Python or Typescript
| dev gets to think and care more about algorithms than the
| average C/C++/Rust dev?
| zahrevsky wrote:
| They even directly conclude at the end of the article that
| improvements in algorithm are more important than the choice of
| language:
|
| > Algorithmic complexity improvements dominate language-level
| optimisations. Going from O(N2) to O(N) in the streaming case
| had a larger practical impact than switching from WASM to
| TypeScript.
|
| Yet they still have chosen to put the "Rust rewrite" part in
| the title. I almost think it's a click bait.
| dmix wrote:
| That blog post design is very nice. I like the 'scrollspy'
| sidebar which highlights all visible headings.
|
| Claude tells me this is https://www.fumadocs.dev/
| sroussey wrote:
| Interesting, thanks. I need make some good docs soon.
| dmix wrote:
| Good documentation is always worth the effort. Markdown
| explaining your products is gold these days with LLMs.
| nine_k wrote:
| "We rewrote this code from language _L_ to language _M_ , and the
| result is better!" No wonder: it was a chance to rectify
| everything that was tangled or crooked, avoid every known bad
| decision, and apply newly-invented better approaches.
|
| So this holds even for _L = M_. The speedup is not in the
| language, but in the rewriting and rethinking.
| MiddleEndian wrote:
| Now they just need a third party who's never seen the original
| to rewrite their TypeScript solution in Rust for even more
| gains.
| nine_k wrote:
| Indeed! But only after a year or so of using it in
| production, so that the drawbacks would be discovered.
| baranul wrote:
| Truth. You can see improvement, even rewriting code in the same
| language.
| azakai wrote:
| You're generally right - rewrites let you improve the code -
| but they do have an actual reason the new language was better:
| avoiding copies on the boundary.
|
| They say they measured that cost, and it was most of the
| runtime in the old version (though they don't give exact
| numbers). That cost does not exist at all in the new version,
| simply because of the language.
| necovek wrote:
| It's doing copies and (de)serialization on both sides into
| native data types.
|
| If they used raw byte structures, implemented the caching
| improvements on the wasm side, the copies might not be as
| bad.
|
| But they still have an issue with multi-language stack:
| complexity also has a cost.
|
| Python/C combo does not have this issue because you can work
| with Python types natively in C, but otherwise, this is a
| cross-language conversion issue, and not a Rust issue at all.
| awesome_dude wrote:
| I think that they were honest about that to a degree, they
| pointed out that one source of the speed up was caused by the
| python fixing a big they hadn't noticed in the C++
|
| Edit: fixed phone typos
| rabisg wrote:
| One of the authors here. While that's generally true, in this
| case it wasn't time that helped us learn what worked. It was a
| nagging sense that the architecture wasn't right, just days
| before launch, along with heavy instrumentation to test our
| assumptions.
| johnisgood wrote:
| I have been saying this for a while now (thought it was
| obvious), and often I get downvoted when I point this out.
| spankalee wrote:
| I was wondering why I hadn't heard of Open UI doing anything with
| WASM.
|
| This new company chose a very confusing name that has been used
| by the Open UI W3C Community Group for over 5 years.
|
| https://open-ui.org/
|
| Open UI is the standards group responsible for HTML having
| popovers, customizable select, invoker commands, and accordions.
| They're doing great work.
| caderosche wrote:
| What is the purpose of the Rust WASM parser? Didn't understand
| that easily from the article. Would love a better explanation.
| joshuanapoli wrote:
| They use a bespoke language to define LLM-generated UI
| components. I think that this is supposed to prevent
| exfiltration if the LLM is prompt-injected. In any case, the
| parser compiles chunks streaming from the LLM to build a live
| UI. The WASM parser restarted from the beginning upon each
| chunk received. Fixing this algorithm to work more
| incrementally (while porting from Rust to TypeScript) improved
| performance a lot.
| evmar wrote:
| By the way, I did a deeper dive on the problem of serializing
| objects across the Rust/JS boundary, noticed the approach used by
| serde wasn't great for performance, and explored improving it
| here: https://neugierig.org/software/blog/2024/04/rust-wasm-to-
| js....
| slopinthebag wrote:
| Did you try something like msgpack or bebop?
| SCLeo wrote:
| They should rewrite it in rust again to get another 3x
| performance increase /s
| slowhadoken wrote:
| Am I mistaken or isn't TypeScript just Golang under the hood
| these days?
| iainmerrick wrote:
| Hmm, there's an in-progress rewrite of the TypeScript compiler
| in Go; is that what you mean?
|
| I don't think that's actually out yet, and more importantly, it
| doesn't change anything at runtime -- your code still runs in a
| JS engine (V8, JSC etc).
| koakuma-chan wrote:
| npm i -D @typescript/native-preview
|
| You can use it today.
| jeremyjh wrote:
| There is too much wrong here to call it a mistake.
| wiseowise wrote:
| Yes, you've uncovered grand conspiracy.
| szmarczak wrote:
| > Attempted Fix: Skip the JSON Round-Trip > We integrated serde-
| wasm-bindgen
|
| So you're reinventing JSON but binary? V8 JSON nowadays is highly
| optimized [1] and can process gigabytes per second [2], I doubt
| it is a bottleneck here.
|
| [1] https://v8.dev/blog/json-stringify [2]
| https://github.com/simdjson/simdjson
| kam wrote:
| No, serde-wasm-bindgen implements the serde Serializer
| interface by calling into JS to directly construct the JS
| objects on the JS heap without an intermediate
| serialization/deserialization. You pay the cost of one or more
| FFI calls for every object though.
|
| https://docs.rs/serde-wasm-bindgen/
| szmarczak wrote:
| Indeed, you're right. However, it still needs to encode and
| decode strings. WASM just needs native interop.
| neuropacabra wrote:
| This is very unusual statement :-D
| nallana wrote:
| Why not a shared buffer? Serializing into JSON on this hot path
| should be entirely avoidable
| mavdol04 wrote:
| I think a shared array just avoids the copy, not the
| serialization which is the main problem as they showed with
| serde-wasm-bindgen test
| notnullorvoid wrote:
| You can avoid the serialization in WASM by pushing structured
| bytes to the SharedArrayBuffer, then do serialization in JS
| which should be relatively cheap compared to pushing JSON
| strings across the boundary.
| ivanjermakov wrote:
| Good software is usually written on 2nd+ try.
| joaohaas wrote:
| God I hate AI writing.
|
| That final summary benchmark means nothing. It mentions
| 'baseline' value for the 'Full-stream total' for the rust
| implementation, and then says the `serde-wasm-bindgen` is '+9-29%
| slower', but it never gives us the baseline value, because
| clearly the only benchmark it did against the Rust codebase was
| the per-call one.
|
| Then it mentions: "End result: 2.2-4.6x faster per call and
| 2.6-3.3x lower total streaming cost."
|
| But the "2.6-3.3x" is by their own definition a comparison
| against the naive TS implementation.
|
| I really think the guy just prompted claude to "get this shit
| fast and then publish a blog post".
| chvish wrote:
| This. It's so annoying to read these types of blogs now where
| the writer clearly didn't put the effort to understand things
| fully or atleast review the blog their LLM wrote. Who is this
| useful for?
| JimDabell wrote:
| The article as a whole makes no sense. They are generating UI
| with an LLM. How fast the UI appears to the user is going to be
| completely dictated by the speed of the LLM, not the speed of
| the serialisation.
| rabisg wrote:
| as an author of the blog - ouch did a little bit more than
| prompt claude but a lot of claude prompting was definitely
| involved
|
| I understand your frustration with AI writing though. We are a
| small team and given our roadmap it was either use LLMs to help
| collate all the internal benchmark results file into a blog or
| never write it so we chose the former. This was a genuinely
| surprising and counterintuitive result for us, which is why we
| wanted to share it. Happy to clarify any of the numbers if
| helpful.
| nssnsjsjsjs wrote:
| Rewrite bias. Yoy want to also rewrite the Rust one in Rust for
| comparison.
| jeremyjh wrote:
| It would be surprising if rewriting in Rust could change the
| WASM boundary tax that the article identified as the actual
| problem.
| rabisg wrote:
| (author here) We'd be really surprised if a rewrite could fix
| the boundary tax but if it does, we'd happily move over to
| it. People (including me) really underestimate how insanely
| fast browser's JSON.parse is
| rented_mule wrote:
| Something not unlike this happened to me when moving some batch
| processing code from C++ to Python 1.4 (this was 1997). The batch
| started finishing about 10x faster. We refused to believe it at
| first and started looking to make sure the work was actually
| being done. It was.
|
| The port had been done in a weekend just to see if we could use
| Python in production. The C++ code had taken a few months to
| write. The port was pretty direct, function for function. It was
| even line for line where language and library differences didn't
| offer an easier way.
|
| A couple of us worked together for a day to find the reason for
| the speedup. Just looking at the code didn't give us any clues,
| so we started profiling both versions. We found out that the port
| had accidentally fixed a previously unknown bug in some code that
| built and compared cache keys. After identifying the small
| misbehaving function, we had to study the C++ code pretty hard to
| even understand what the problem was. I don't remember the exact
| nature of the bug, but I do remember thinking that particular
| type of bug would be hard to express in Python, and that's
| exactly why it was accidentally fixed.
|
| We immediately started moving the rest of our back end to Python.
| Most things were slower, but not by much because most of our back
| end was i/o bound. We soon found out that we could make
| algorithmic improvements so much more quickly, so a lot of the
| slowest things got a lot faster than they had ever been. And,
| most importantly, we (the software developers) got quite a bit
| faster.
| asa400 wrote:
| Fun story! Performance is often highly unintuitive, and even
| counterintuitive (e.g. going from C++ to Python). Very much an
| art as well as a science.
|
| Crazy how many stories like this I've heard of how doing
| performance work helped people uncover bugs and/or hidden
| assumptions about their systems.
| staticassertion wrote:
| It doesn't come off as unintuitive by my read. They had a bug
| that led to a massive performance regression. Rewriting the
| code didn't have that bug so it led to a performance
| improvement.
|
| They found that they had fewer bugs in Python so they
| continued with it.
| harpiaharpyja wrote:
| I think a lot of people (especially those who are only
| peripherally involved in development, like management)
| don't really consider performance regressions at all when
| thinking about how to get software to go faster.
|
| Meanwhile my experience has been that whenever there has
| been a performance issue severe enough to actually matter,
| it's often been the result of some kind of performance bug,
| not so much language, runtime, or even algorithm choices
| for that matter.
|
| Hence whenever the topic of how to improve performance
| comes up, I always, _always_ insist that we profile first.
| staticassertion wrote:
| My experience has been that performance bugs show up in
| lots of places and I'm very lucky when it's just a bug.
| The far more painful performance issues are language and
| runtime limitations.
|
| But, of course, profiling is always step one.
| asveikau wrote:
| > After identifying the small misbehaving function, we had to
| study the C++ code pretty hard to even understand what the
| problem was. I don't remember the exact nature of the bug, but
| I do remember thinking that particular type of bug would be
| hard to express in Python, and that's exactly why it was
| accidentally fixed.
|
| Pure speculation, but I would guess this has something to do
| with a copy constructor getting invoked in a place you wouldn't
| guess, that ends up in a critical path.
| NooneAtAll3 wrote:
| good ol' shallow-vs-deep copy
| andrewflnr wrote:
| Given the context, I'm thinking bad cache keys resulting in
| spurious cache misses, where the keys are built in some low-
| level way. Cache misses almost certainly have a bigger
| asymptotic impact than extra copies, unless that copy
| constructor is really heavy.
| asveikau wrote:
| I'm just remembering a performance issue I heard of eons
| ago where a sorting function comparison callback
| inadvertently allocated memory. It made sorting very slow.
| Someone said in a meeting that sorting was slow, and we all
| had a laugh about "shouldn't have used the bubble sort!"
| But it was the key comparison doing something stupid.
| branko_d wrote:
| My guess would be bad hashing, resulting in too many
| collisions.
| ameixaseca wrote:
| My experience is the exact opposite.
|
| This was particularly true for one of the projects I've worked
| with in the past, where Python was chosen as the main language
| for a monitoring service.
|
| In short, it proved itself to be a disaster: just the Python
| process collecting and parsing the metrics of all programs
| consumed 30-40% of the processing power of the lower end boxes.
|
| In the end, the project went ahead for a while more, and we had
| to do all sorts of mitigations to get the performance impact to
| be less of an issue.
|
| We did consider replacing it all by a few open source tools
| written in C and some glue code, the initial prototype used few
| MBs instead of dozens (or even hundreds) of MBs of memory,
| while barely registering any CPU load, but in the end it was
| deemed a waste of time when the whole project was terminated.
| serial_dev wrote:
| Another anecdote, the team couldn't improve concurrency
| reliably in Python, they rewrote the service in about a month
| (ten years ago) in Go, everything ran about 20x faster.
| wiseowise wrote:
| > but in the end it was deemed a waste of time when the whole
| project was terminated.
|
| The main lesson of the story. Just pick Python and move fast,
| kids. It doesn't matter how fast your software is if nobody
| uses it.
| stephantul wrote:
| This is it. Getting something on the table for stakeholders
| to look at trumps anything else.
| ameixaseca wrote:
| It would have taken the same time, if not less, given the
| extra time for mitigations, trying different optimization
| techniques, runtimes, etc.
|
| One of the reasons the project was killed was that we
| couldn't port it to our line of low powered devices
| without a full rewrite in C.
|
| Please note this was more than a decade ago, way before
| Rust was the language it was today. I wouldn't chose
| anything else besides Rust today since it gives the best
| of both worlds: a truly high level language with low
| level resource controls.
| littlestymaar wrote:
| And this is why pretty much all commercial software is
| terrible and runs slower than the equivalent 20 years ago
| despite incredible advance in hardware.
| philipallstar wrote:
| For lots of software there wasn't an equivalent 20 years
| ago because there wasn't a language that would let
| developers explore semi-specified domains fast enough to
| create something useful. Unless it was visual basic, but
| we can't use that, because what would all the UX people
| be for?
| Aeolun wrote:
| You can use Go and get the best of both worlds.
| nickserv wrote:
| One of the slowest, most ineficient code bases I've ever
| worked on was in Go.
|
| The mentality was "the language is fast, so as long as it
| compiles we're good"... Yeah that worked out about as
| well as you'd expect.
| zeroc8 wrote:
| But that has nothing to do with the language.
| nickserv wrote:
| Absolutely, and it's a good language when used properly.
| This was more of a problem with the hype surrounding it.
| lowbloodsugar wrote:
| I would agree except for the python part. Sure, you gotta
| move fast, but if you survive a year you still gotta move
| fast, and I've never seen a python code base that was still
| coherent after a year. Expert pythonistas will claim,
| truthfully, that they have such a code base but the same
| can be said of expert rustaceans. I would stick to
| typescript or even Java. It will still be a shitshow after
| a year but not quite as fucked as python.
| miki123211 wrote:
| https://github.com/polarsource/polar/tree/main/server
|
| If you're writing FastAPI (and you should be if you're
| doing a greenfield REST API project in Python in 2026),
| just s/copy/steal/ what those guys are doing and you'll
| be fine.
| Someone wrote:
| > Just pick Python and move fast, kids. It doesn't matter
| how fast your software is if nobody uses it.
|
| The reason nobody uses your software could be that it is
| too slow. As an example, if you write a video encoder or
| decoder, using pure Python might work for postage-stamp
| sized video because today's hardware is insanely fast, but
| even, it likely will be easier to get the same speed in a
| language that's better suited to the task.
| wiseowise wrote:
| Learning that it's too slow takes users.
| bjoli wrote:
| if input() == "dynamic scope?": defined = "happyhappy"
| print(defined)
|
| I'd rather not use python. The ick gets me every time.
| czhu12 wrote:
| Ditto for me. I had gotten so used to building web backends
| in Ruby and running at 700MB minimum. When I finally got
| around to writing a rust backend, it registered in the
| metrics as 0MB, so I thought for sure the application had
| crashed.
|
| Turns out the metrics just rounded to the nearest 5MB
| naasking wrote:
| > just the Python process collecting and parsing the metrics
| of all programs consumed 30-40% of the processing power of
| the lower end boxes.
|
| Just write the parsing loop in something faster like C or
| Rust, instead of the whole thing.
| Traubenfuchs wrote:
| He struggled with the algorithms, you struggled with the
| runtime.
|
| You are not the same.
| tda wrote:
| Ome advantage of python is that it is so slow that if you
| choose the wrong algorithm or data structure that soon gets
| obvious. And for complicated stuff this is exactly where I find
| the LLMs struggle. So I make a first version in Python, and
| only when I am happy with the results and the speed feels
| reasonable compared to the problem complexity, I ask Claude
| Code to port the critical parts to Rust.
| rabisg wrote:
| The last part is really interesting. It feels like the whole
| world will soon become Python/JS because thats what LLMs are
| good at. Very few people will then take the pain of
| optimizing it
| eru wrote:
| The LLMs are pretty good at optimising.
|
| Not because they are brilliant, but because they are pretty
| good at throwing pretty much all known techniques at a
| problem. And they also don't tire of profiling and running
| experiments.
| elcritch wrote:
| Not just profiling, but decoding protocols too.
|
| Recently I tried Codex/GPT5 with updating a bluetooth
| library for batteries and it was able to start capturing
| bluetooth packets and comparing them with the libraries
| other models. It was indefatigable. I didn't even know if
| was so easy to capture BLE packets.
| anthk wrote:
| Wireshark would do that. But you need to understand low
| level tools because in case on some BGP attack you all
| LLM developers will be fired in the spot.
|
| Flakey internet connection: most of current 'soy devs'
| would be useless. Even more with boosted up chatbots.
| mirsadm wrote:
| Not in my experience. They're pretty good at getting
| average performance which is often better than most
| programmers seem to be willing to aim for.
| miki123211 wrote:
| If there's one thing LLMs are really, really good at,
| it's having a target and then hitting / improving upon
| that target.
|
| If you have a comprehensive test suite or a realistic
| benchmark, saying "make tests pass" or "make benchmark go
| up" works wonders.
|
| LLMs are really good at knowing patterns, we still need
| programmers to know which pattern to apply when. We'll
| soon reach a point where you'll be able to say "X is
| slow, do autoresearch on X" and X will just magically get
| faster.
|
| The reason we can't yet isn't because LLMs are stupid,
| it's because autoresearch is a relatively new (last month
| or so) concept and hasn't yet entered into LLM
| pretraining corpora. LLMs can already do this, you just
| need to be a little bit more explicit in explaining
| exactly what you need them to do.
| philipallstar wrote:
| I've not tried this yet, but doesn't it use up loads of
| tokens? How do you do it efficiently?
| 9rx wrote:
| _> JS because thats what LLMs are good at._
|
| That has not been my experience. JS/TS requires the most
| hand-holding, by far. LLMs are no doubt assumed to be good
| at JS due to the sheer amount of training data, but a lot
| of those inputs are of really poor quality, and even among
| the high quality inputs there isn't a whole lot of
| consistency in how they are written. That seems to trip up
| the LLMs. If anything, LLMs might finally be what breaks
| the JS camel's back. Although browser dominance still makes
| that unlikely.
|
| _> Very few people will then take the pain of optimizing
| it_
|
| Today's LLMs rarely take the initiative to write
| benchmarks, but if you ask it will and then will iterate on
| optimizing using the benchmark results as feedback. It
| works fairly well. There is a conceivable near future where
| LLMs or LLM tools will start doing this automatically.
| rabisg wrote:
| My experience is from trying to get the React Native
| example to work with OpenUI. Felt Sonnet/Opus was much
| better at figuring out whats wrong with the current React
| implementation and fixing it than it was with React
| Native
|
| But yes I see what you mean and I think people are trying
| to solve it with skills and harnesses at the application
| layer but its not there yet
| shevy-java wrote:
| > We immediately started moving the rest of our back end to
| Python. Most things were slower, but not by much because most
| of our back end was i/o bound.
|
| Would be kind of cool if e. g. python or ruby could be as fast
| as C or C++.
|
| I wonder if this could be possible, assuming we could modify
| both to achieve that as outcome. But without having a language
| that would be like C or C++. Right now there is a strange
| divide between "scripting" languages and compiled ones.
| nubg wrote:
| @dang this is an ai slop account, check his other comments
| peter_retief wrote:
| I suspect that you used highly optimized algorithms written for
| python, like the vector algorithms in numpy? You will struggle
| to write better code, at least I would.
| masklinn wrote:
| Python 1.4 would be mid-late 90s long before numpy and vector
| algorithms would have been available.
|
| I suspect it's more likely to be something like passing
| std::string by value not realising that would copy the string
| every time, especially with the statement that the mistake
| would be hard to express in Python.
| johnisgood wrote:
| Everything is new to the uninitiated. :P
| WalterBright wrote:
| > We soon found out that we could make algorithmic improvements
| so much more quickly
|
| It's true that writing code in C doesn't automatically make it
| faster.
|
| For example, string manipulation. 0-terminated strings (the
| default in C) are, frankly, an abomination. String processing
| code is a tangle of strlen, strcpy, strncpy, strcat, all of
| which require repeated passes over the string looking for the
| 0. (Even worse, reloading the string into the cache just to
| find its length makes things even slower.)
|
| Worse is the problem that, in order to slice a string, you have
| to malloc some memory and copy the string. And then carefully
| manage the lifetime of that slice.
|
| The fix is simple - use length-delimited strings. D relies on
| them to great effect. You can do them in C, but you get no
| succor from the language. I've proposed a simple enhancement
| for C to make them work
| https://www.digitalmars.com/articles/C-biggest-mistake.html but
| nobody in the C world has any interest in it (which baffles me,
| it is so simple!).
|
| Another source of slowdown in C is I've discovered over the
| years that C is not a plastic language, it is a brittle one.
| The first algorithm you select for a C project gets so welded
| into it that it cannot be changed without great difficulty.
| (And we all know that algorithms are the key to speed, not
| coding details.) Why isn't C plastic?
|
| It's because one cannot switch back and forth between a
| reference type and a value type without extensively rewriting
| every use of it. For example: struct S { int
| a; } int foo(struct S s) { return s.a; } int
| bar(struct S *s) { return s->a; }
|
| If you want to switch between reference and value, you've got
| to go through all your code swapping . and ->. It's just too
| tedious and never happens. In D: struct S {
| int a; } int foo(S s) { return s.a; } int bar(S
| *s) { return s.a; }
|
| I discovered while working on D that there is _no reason_ for
| the C and C++ - > operator to even exist, the . operator covers
| both bases!
| zeroonetwothree wrote:
| I ported Python to C++ one time and it ran 10c faster with 10x
| less memory usage with no architectural changes
| slopinthebag wrote:
| This article is obviously AI generated and besides being jarring
| to read, it makes me really doubt its validity. You can get
| substantially faster parsing versus `JSON.parse()` by parsing
| structured binary data, and it's also faster to pass a byte array
| compared to a JSON string from wasm to the browser. My guess is
| not only this article was AI generated, but also their
| benchmarks, and perhaps the implementation as well.
| StilesCrisis wrote:
| It's vibe code all the way down!
| jeremyjh wrote:
| > The openui-lang parser converts a custom DSL emitted by an LLM
| into a React component tree.
|
| > converts internal AST into the public OutputNode format
| consumed by the React renderer
|
| Why not just have the LLM emit the JSON for OutputNode ? Why is a
| custom "language" and parser needed at all? And yes, there is a
| cost for marshaling data, so you should avoid doing it where
| possible, and do it in large chunks when its not possible to
| avoid. This is not an unknown phenomenon.
| envguard wrote:
| The WASM story is interesting from a security angle too. WASM
| modules inheriting the host's memory model means any parsing bugs
| that trigger buffer overreads in the Rust code could surface in
| ways that are harder to audit at the JS boundary. Moving to
| native TS at least keeps the attack surface in one runtime, even
| if the theoretical memory safety guarantees go down.
| marcosdumay wrote:
| It would be great if people stopped dismissing the problem that
| WASM not being a first-class runtime for the web causes.
| kennykartman wrote:
| I dream of the day in which there is no need to pass by JS and
| Wasm can do all the job by itself. Meanwhile, we are stuck.
| vmsp wrote:
| Not directly related to the post but what does OpenUI do? I'm
| finding it interesting but hard to understand. Is it an
| intermediate layer that makes LLMs generate better UI?
| rabisg wrote:
| Its the library that bridges the gap between LLMs and live UI.
| Best example would be to imagine you want to build interactive
| charts within your AI agent (like Claude)
|
| The most obvious approach would be to let LLMs generate code
| and render it but that introduces problems like safety, UI
| consistency and speed. OpenUI solves those problems and
| provides a safe, consistent and token optimized runtime for the
| LLMs to render live UI
| aquariusDue wrote:
| Is it kinda similar to the new GenUI SDK for Flutter in that
| sense?
|
| https://docs.flutter.dev/ai/genui
| rabisg wrote:
| Haven't looked in depth but yes it feels like they are
| solving the same problem.
|
| This is an alternative to json-render by Vercel or A2UI by
| Google which I'm guessing the flutter implementation is
| based on
| owenpalmer wrote:
| So this is an issue with WASM/JS interop, not with Rust per se?
| measurablefunc wrote:
| I tried a similar experiment recently w/ FFT transform for wav
| files in the browser and javascript was faster than wasm. It was
| mostly vibe coded Rust to wasm but FFT is a well-known algorithm
| so I don't think there were any low hanging performance
| improvements left to pick.
| wintermute4282 wrote:
| It looks like FFTW3 is working on wasm support:
| https://github.com/FFTW/fftw3/issues/293
|
| You could also try pretty fast fft:
| https://github.com/JorenSix/pffft.wasm
| simonbw wrote:
| Yeah if you're serializing and deserializing data across the JS-
| WASM boundary (or actually between web workers in general whether
| they're WASM or not) the data marshaling costs can add up. There
| is a way of sharing memory across the boundary though without any
| marshaling: TypedArrays and SharedArrayBuffers. TypedArrays let
| you transfer ownership of the underlying memory from one worker
| (or the main thread) to another without any copying.
| SharedArrayBuffers allow multiple workers to read and write to
| the same contiguous chunk of memory. The downside is that you
| lose all the niceties of any JavaScript types and you're
| basically stuck working with raw bytes.
|
| You still do get some latency from the event loop, because
| postMessage gets queued as a MacroTask, which is probably on the
| order of 10ms. But this is the price you have to pay if you want
| to run some code in a non-blocking way.
| jesse__ wrote:
| This should be the top comment
| osullivj wrote:
| Strongly agree from an Emscripten C++ wasm pov: it's key to
| minimise emscripten::val roundtrips. Caches must be designed
| for rectilinear data geometry, and SharedArrayBuffers are the
| way for bulk data. But only JS allows us to express asynchrony,
| so we need an on_completion callback design at the lang
| boundary.
| tankenmate wrote:
| Indeed a whole class of issues become moot if you just don't
| use javascript anywhere. In the browser world this is
| obviously difficult/impossible; I look forward to the day
| when WASM can run natively in a browser and doesn't need
| javascript at all, DOM, network, etc, etc. On the server
| side? Just steer clear of the javascript ecosystem
| altogether.
| fHr wrote:
| So the actual processing is faster in rust/c/c++ but the
| marshaling costs are so big so ts is faster in this case? No
| vlue how something like swc does this but there it's way faster
| then babel.
| sakesun wrote:
| I heard a lot of similar stories in the past when I started using
| Python 20+ years ago. A number of people claimed their solutions
| got faster when develop in Python, mainly because Python make it
| easier to quickly pivot to experiment with various alternative
| methods, hence finally yield at more efficient outcome at the
| end.
| horacemorace wrote:
| I'm more of a dabbler dev/script guy than a dev but Every.
| single. thing I ever write in javascript ends up being incredibly
| fast. It forces me to think in callbacks and events and promises.
| Python and C (or async!) seem easy and sorta lazy in comparison.
| jesse__ wrote:
| This somehow reminds me of the days when the fastest way to deep
| copy an object in javascript was to round trip through toString.
| I thought that was gross then, and I think this is gross now
| athrowaway3z wrote:
| Its also worth underlining that it's not just "The parsing
| computation is fast enough that V8's JIT eliminates any Rust
| advantage", but specifically that this kind of straight-forward
| well-defined data structures and mutation, without any strange
| eval paths or global access is going to be JITed to near native
| speed relatively easily.
| mwcampbell wrote:
| I hope we can still get to a point where wasm modules can
| directly access the web platform APIs and get JS out of the
| picture entirely. After all, those APIs themselves are
| implemented in C++ (and maybe some Rust now).
| shevy-java wrote:
| So ...
|
| Rust.
|
| WASM.
|
| TypeScript.
|
| I am slowly beginning to understand why WASM did not really
| succeed.
| bulbar wrote:
| Is this an outlier or has Rust started to be part of the
| establishment and being 'old' so that people want to share their
| "moving away from Rust" stories?
|
| I didn't mind reading articles that are not about how Rust is
| great in theory (and maybe practice).
| quotemstr wrote:
| There's a certain segment of the industry that's always chasing
| the newest thing. Many of them like Zig for some ghastly
| reason.
|
| That said, Rust does have real problems. Manual memory
| management _sucks_. People think GC is expensive? Well, keep in
| mind malloc() and free() take global locks! People just have
| totally bogus mental models of what drives performance. These
| models lead them to technical nonsense.
| zozbot234 wrote:
| This story is about moving away from WASM for an application
| that's unsuitable for it. It's not really about Rust.
| notnullorvoid wrote:
| It's not an unsuitable application for WASM. They could've
| drastically reduced the WASM boundary impact if instead of
| mapping to JSON in Rust they streamed out structured bytes to
| JS then mapped to JSON there. And the streaming fix was
| language independent.
|
| So it's more so a story about architectural mistakes.
| Dwedit wrote:
| JS and WASM share the main arraybuffer. It's just very not-
| javascript-like to try to use an arraybuffer heap, because then
| you don't have strings or objects, just index,size pairs into
| that arraybuffer.
|
| Anyway, Javascript is no stranger to breaking changes. Compare
| Chromium 47 to today. Just add actual integers as another
| breaking change, then WASM becomes almost unnecessary.
| fHr wrote:
| I almost can't believe this swc for example is 80x faster then
| babeljs.
| gettingoverit wrote:
| In ye olden days of WASM just added to the browser, the
| difference between native JS and boost::spirit in WASM was x200.
|
| In their worst case it was just x5. We clearly have some progress
| here.
| pjmlp wrote:
| This is why, when a programming language already has tooling for
| compilers, being it ahead of time, or dynamic, it pays off to
| first go around validating algorithms and data structures before
| a full rewrite.
|
| Additionally even after those options are exhausted, only a key
| parts might need a rewrite, not the whole thing.
|
| However, I wonder how many care about actually learning about
| algorithms, data structures and mechanical sympathy in the age of
| Electron apps.
|
| It feels quite often that a rewrite is chosen, because knowing
| how to actually apply those skills is the CS stuff many think
| isn't worthwhile learning about.
| coldtea wrote:
| > _However, I wonder how many care about actually learning
| about algorithms, data structures and mechanical sympathy in
| the age of Electron apps._
|
| Never mind the age of Electron apps, even fewer care about
| those in the age of agents.
| pjmlp wrote:
| Agreed, however I would assert that in the age of agents,
| programming languages will become irrelevant to most, other
| those lucky enough druids to write AI runtime stack, at the
| AI overlords.
|
| And those will still care about CS.
| moomin wrote:
| "We saw huge speed-ups when changing technology."
|
| Looks inside
|
| "The old implementation had some really inappropriate choices."
|
| Every time.
| LunaSea wrote:
| This has been known by Node.js developers for a while with many
| C++ core and NPM modules being rewritten in JavaScript to improve
| performance.
| bluelightning2k wrote:
| Great write up. It feels like craft in the age of slop.
|
| Not sold about the fundamental idea of OpenUI though. XML is a
| great fit for DSLs and UI snippets.
| twoodfin wrote:
| Are you kidding? To the extent this was "crafted" it was by an
| LLM from somebody's notes in a prompt.
|
| The other day, someone linked back to this 2018 post on finding
| a cache coherency bug in the Xbox 360 CPU:
|
| https://randomascii.wordpress.com/2018/01/07/finding-a-cpu-d...
|
| So much more genuinely engaging than _any_ of the AI-"enhanced"
| sloppy, confused, trite writing that gets to the front page
| here daily because it's been hyper-optimized for upvotes.
| rabisg wrote:
| We tried all formats - XML, json, jsonl, even toon - before
| deciding that we need to invest in OpenUI Lang
|
| The primary motivation was speed and schema cohesion. We were
| running a JSON based format, Thesys C1, in production for a
| year before we realized we cannot add features fast enough
| because we were fighting the LLMs at multiple levels. It's
| probably too much to write in a comment but we'd like to write
| about the motivation and all the things we tried ona a separate
| blog soon
| diablevv wrote:
| The real lesson here isn't "TypeScript beats Rust" - it's that
| WASM has non-trivial overhead that's easy to underestimate. The
| JS engine has spent decades being optimized specifically for the
| patterns JS/TS code tends to produce. When you cross the WASM
| boundary, you pay for it: serialization, memory copies, the
| impedance mismatch between WASM's linear memory model and JS's
| garbage-collected heap.
|
| For a parser specifically, you're probably spending a lot of time
| creating and discarding small AST nodes. That's exactly the kind
| of workload where V8's generational GC shines and where WASM's
| manual memory management becomes a liability rather than an
| asset.
|
| The interesting question is whether this scales. A parser that
| runs on small inputs in a browser is a very different beast from
| one processing multi-megabyte files in a tight loop. At some
| point the WASM version probably wins - the question is whether
| that workload actually exists in your product.
| mohsen1 wrote:
| When there is a solid test harness, AI Coding can do magic!
|
| It was able to beat XZ on its own game by a good margin:
|
| https://github.com/mohsen1/fesh
| applfanboysbgon wrote:
| > I had no idea how any of this works.
|
| This is apparent. xz's own game is not "a specialized
| compression pre-processor for x86_64 ELF binaries.". xz's own
| game is a general-purpose compression utility suited for a
| range of tasks, not optimized for one ridiculously specific
| domain. Also, any compression benchmark really ought to include
| speed of de/compression, not only compression ratio, as
| compression algorithms occupy along a scale trying to maximize
| one trade-off or another.
| mohsen1 wrote:
| I never claimed to beat xz as a general-purpose compressor.
| .tar.xz is the dominant format for Linux source tarballs and
| distro packages. So optimizing for ELF + x86_64 is optimizing
| for a very real and common case, not some toy benchmark.
|
| btw goal of the project was _not_ building a production ready
| solution. It was curious case of black box software
| development. Compression is great because input and output
| are precise bits. As for speed, I think it 's comparable
| since it's using most of XZ infra anyways.
| rpodraza wrote:
| Press x to doubt
| gavinray wrote:
| Why weren't you able to use WASM shared heaps to get zero-copy
| behavior?
|
| AFAIK, you can create a shared memory block between WASM <-> JS:
|
| https://developer.mozilla.org/en-US/docs/WebAssembly/Referen...
|
| Then you'd only need to parse the SharedArrayBuffer at the end on
| the JS side
___________________________________________________________________
(page generated 2026-03-21 23:01 UTC)