[HN Gopher] I'm Porting the TypeScript Type Checker Tsc to Go
___________________________________________________________________
I'm Porting the TypeScript Type Checker Tsc to Go
Author : msoad
Score : 230 points
Date : 2022-01-25 17:07 UTC (5 hours ago)
(HTM) web link (kdy1.dev)
(TXT) w3m dump (kdy1.dev)
| nevir wrote:
| Super skeptical that porting this to a new language is going to
| have the kind of performance gains they expect ... assuming they
| want to replicate all of tsc's functionality
| eyelidlessness wrote:
| It's already proven (esbuild, SWC, Bun) to drastically improve
| performance of TypeScript's other responsibility: compilation
| (tsc doesn't offer an option for this, but ts-node does, for
| comparison). And while the type checker is certainly more
| complex, I have a hard time imagining it won't show similar
| improvements in a native language.
| resoluteteeth wrote:
| Compiling typescript is basically just stripping the type
| definitions/annotations and other typescript-specific parts
| of the syntax out. That's a huge difference from actually
| type checking it.
| mariogintili wrote:
| k__ wrote:
| tsc relies in shared state and is slow, so they port it to a
| language that allows shared state.
|
| Strange reasoning.
| pdpi wrote:
| tsc is slow, so you port it to a faster language. tsc relies on
| shared state, so you use a language that supports shared state
| so you're just porting the same design to a new language
| instead of redesigning the whole program.
| Yoric wrote:
| (er, some text in the parent seems to have vanished)
|
| Sure, if you absolutely wish it to, in an `unsafe` block, you
| can define two writeable pointers to the same piece of data. I
| don't think I've ever seen anybody write such code in Rust
| (except perhaps when linking to a C library that required it?),
| nor have I ever felt the need to do so.
|
| Now, it is true that Rust is designed to discourage writing
| cyclic data structures. I can imagine that trying to port
| (rather than redesign) an algorithm that deeply requires cyclic
| data structures, that will make your life miserable.
| wizzwizz4 wrote:
| You actually _can 't_ define two `&mut` references to the
| same data in Rust. It's invalid Rust code; if the compiler
| notices, it'll refuse to compile, and if it _doesn 't_
| notice, your code will randomly break.
| Yoric wrote:
| You can define two `*mut`, though, in unsafe code.
| anderskaseorg wrote:
| A better option is to use Cell or RefCell; they allow
| completely safe shared mutability in single-threaded
| code.
|
| https://doc.rust-lang.org/std/cell/
| jerf wrote:
| It sounds like you think "tsc is slow" _because_ "it uses
| shared state". If that is what you're getting at, you've
| connected two unconnected statements.
|
| It does tend to imply the new tsc isn't going to be able to
| take much more advantage of concurrency than the current one,
| which may put a limit on what performance gains it can expect.
| I don't know if the current tsc is able to do anything
| concurrently. (Here, "concurrently" doesn't merely mean "able
| to have multiple events in flight and thus do multiple tasks at
| the same 'time'" but truly "doing work on multiple CPUs
| simultaneously".) Go is generally faster than Node, but it'll
| be hard to guess by how much in this case.
|
| A Rust port would be harder but would probably lead on in a
| direction to improve the concurrency. However, that may be an
| infeasible amount of work, not because of Rust per se but
| because a port is intrinsically easier than a port _and_ a
| rearchitecting.
| munificent wrote:
| The reasoning is only strange if you assume that the cause of
| slowness is the shared state.
|
| CPUs have no problems with mutability. JS is slow because of
| dynamic typing, dispatch, flexible object layout, and an object
| model that requires lots and lots of heap allocation and
| pointer chasing.
| pjscott wrote:
| tsc is a large JavaScript program run at the command line,
| which is a pathological case for a JIT-compiled language: by
| the time the hot spots get native code generated, tsc may be
| close to finished already. Porting it to any ahead-of-time
| compiled language, without significant changes to the structure
| of the code, is likely to be a big performance win.
| tshaddox wrote:
| As other people have pointed out, it doesn't seem like the
| author (or anyone else, perhaps other than you) is suggesting
| that shared state is a significant reason that tsc is slow.
|
| But moreover, my takeaway was that the author is making a
| distinction between a _complete rewrite_ and a _port_ , where
| the former seems to be any rewrite that just tries to do the
| same thing (but may be architecturally very different), while
| the latter seems to be an approach where you attempt to
| directly adapt the original source code into a new language on
| a statement-by-statement basis. The latter would likely require
| the target language to support the programming paradigms which
| are used heavily in the original source code.
| konart wrote:
| >shared mutability
|
| Sure but this goes against "Don't communicate by sharing memory",
| no?
| scottlamb wrote:
| > Sure but this goes against "Don't communicate by sharing
| memory", no?
|
| Yes, but that's advice rather than something Go requires. It's
| reasonable to say it's outweighed by the desire to avoid
| significantly restructuring the existing logic. The author is
| explicitly choosing to match the upstream program's structure
| to limit the project scope, and whether that structure is good
| or not is almost besides the point.
|
| btw: especially in the Rust community, shared mutability
| doesn't always mean shared between threads/tasks. It's also a
| concern in single-threaded/non-concurrent code due to aliasing.
| This case does actually seem to be involving multithreading
| given that the author mentioned "using 8 threads".
| newlisp wrote:
| Curious, can you elaborate on what are the issues of
| mutability in single-threaded code?
| scottlamb wrote:
| It comes down to how a language construct allocates freedom
| to the programmer vs the optimizing compiler.
|
| * Rust references are strict. &mut references guarantee
| that nothing else can access the backing memory while the
| &mut is valid. & references guarantee that nothing
| (including &mut by the rule above, or unsafe code with raw
| pointers by programmer's pinkie swear) can change the
| backing memory while the &mut is valid. This allows the
| compiler to perform more optimizations behind the scene,
| like assuming a value loaded into a register is still up-
| to-date, even if there's been a function call between.
|
| * Rust raw pointers make no guarantees (but require the
| programmer to type "unsafe" to use).
|
| * C pointers are IMHO weirder. iirc, two pointers of the
| same type may alias (unless you use the rare "restrict"
| keyword), but two pointers of different type can't, unless
| one is "char*".
|
| * AFAIK, Go pointers are at least as permissive as C's.
| They might not make any aliasing guarantees at all.
|
| * Other languages vary...
| sjhuang26 wrote:
| One thing that was not addressed in the post is the option of
| using a GC library in Rust. I'm not a Rust expert, but to me, it
| seems like this makes it quite feasible to quickly port a
| compiler. They could wrap most complex data structures in GC and
| optimize them as time goes by. That being said, I'd assume that
| the library's GC would be slower than Go, so perhaps that's a
| deal-breaker.
| jkelleyrtp wrote:
| TSC doesn't need to "stick around", right? Just a run-once and
| the program is over?
|
| In those cases, https://github.com/fitzgen/bumpalo works
| amazingly as an arena. You can pretty much forget about reference
| counting and have direct references everywhere in your graph. The
| disadvantage is that it's hard to modify your tree without
| leaving memory around.
|
| We use it extensively in http://github.com/dioxusLabs/dioxus and
| don't need to worry about Rc anywhere in the graph/diffing code.
| anonymousCar wrote:
| I think --watch mode means it's running while developing
| connor4312 wrote:
| Additionally, the language server used to power editors is
| long lived. TypeScript isn't just the compiler.
| smasher164 wrote:
| I wonder how it would fare if instead of rewriting from scratch,
| they applied automated transformation from the Typescript source
| code to Go. It wouldn't have to capture all the semantics of
| TS/JS, but be opinionated enough to transform the Typescript
| source.
|
| Another possibility is to use something like GraalVM native-image
| to compile Typescript's type-checker into a binary. This might
| offer a speedup over its execution on V8.
| marcus_holmes wrote:
| Given that ESBuild [0] is a way better replacement for Webpack
| already (and similarly written in Go) I can see how we could
| pretty soon have an entire toolchain written in Go for dealing
| with compiling JS.
|
| Hopefully the whole toolchain will be embeddable, too. Being able
| to do configuration-in-code and produce an executable that does
| the whole build step in milliseconds would be cool.
|
| [0] https://esbuild.github.io/
| newlisp wrote:
| The author of esbuild also started the implementation of
| esbuild in rust, found it painful too and switched to build it
| in go.
| pjmlp wrote:
| For me the ideal use case for Rust is really for places where
| automatic memory management is a hard sell, even if
| technically possible, kernels, drivers, hypervisors,
| firmware, GPGPU, ...
|
| For everything else I rather enjoy the productivity of
| automatic memory management.
| [deleted]
| moltonel3x wrote:
| SWC (by the same author as this tentative tsc port) is a part
| of more and more people's typescript toolchain, and is written
| in Rust. Having all your ecosystem in a single language feels
| nice, but pragmatically different tools might be better in
| different languages.
| marcus_holmes wrote:
| The point of having it all embeddable in Go is that we could
| write a single executable that handles the entire process,
| with the toolchain embedded and the configuration written as
| code.
| tantalor wrote:
| This is a "rewrite" not a "port".
| londons_explore wrote:
| At first I thought this was for type checking Go code... Ya know,
| all that interface{} stuff...
| wrs wrote:
| 62X seems like a lot. It makes me wonder if someone has put
| serious effort into profiling and optimizing the JavaScript
| version. If not, that could be a better use of time than a port.
| I'd guess the TypeScript team may not be interested in
| maintaining a Go version, so this port will will always be behind
| the current TypeScript definition.
| sam0x17 wrote:
| Oof trust me from experience you do _not_ want to write a
| compiler or parser or anything like that in Go. The string
| support is so limited -- for example you have to implement basic
| things like reverse string from scratch instead of them (and much
| more) being in the standard library. Much better to use something
| like Crystal or Rust.
| ricardobeat wrote:
| ESBuild seems to be doing fine.
|
| I'm a crystal fan though, would love to see more tooling use
| it.
| pjmlp wrote:
| Way better than C.
| ngrilly wrote:
| The only string handling a compiler needs is usually during
| tokenization. After that the compiler only manipulates tokens,
| AST, SSA, IR, etc.
| sam0x17 wrote:
| And then you face Go's usual awkward issues with having a
| lack of generics which becomes important when, say,
| traversing a tree.
| pjmlp wrote:
| Thankfully that will end next month.
| constexpr wrote:
| I'm the author of esbuild. I hadn't written Go before I
| started esbuild and I thought I would miss generics more,
| but I didn't. There is literally only one situation in the
| whole esbuild code base where I've found myself wishing for
| generics: sorting an array. The workaround is easy and
| involves implementing an interface with three methods, each
| of which is typically a single line:
| https://pkg.go.dev/sort#Interface. And I only needed a few
| of these interfaces for a few different types. That wasn't
| nearly enough of a reason to not use Go, especially
| considering the other advantages.
| Zababa wrote:
| I see a few people wanting to rewrite tsc in something else than
| Typescript because tsc itself is too slow. I don't remember
| seeing any "proof" or analysis that Typescript is the limiting
| factor in tsc. The basis often used is the performance of esbuild
| compared to other bundlers. From the esbuild FAQ, "Why is esbuild
| fast?":
|
| > Several reasons:
|
| > It's written in Go and compiles to native code.
|
| > Parallelism is used heavily.
|
| > Everything in esbuild is written from scratch.
|
| > Memory is used efficienty.
|
| Using another language will solve 1 and will maybe solve 2. From
| this FAQ too:
|
| > For example, many bundlers use the official TypeScript compiler
| as a parser. But it was built to serve the goals of the
| TypeScript compiler team and they do not have performance as a
| top priority. Their code makes pretty heavy use of megamorphic
| object shapes and unnecessary dynamic property accesses (both
| well-known JavaScript speed bumps).
|
| tsc itself seems like it could be optimized. By how much, I have
| no idea. But it sounds like it would be easier than writing
| something from scratch. esbuild, while a great tool, is limited
| in scope. Typechecking Typescript on the other hand is a moving
| target, and a fast one at that. All in all, I'm not convinced by
| this approach compared to "regular performance work" on tsc
| itself. I think people talk about that because it's easier to
| start your project than to change tsc. In that case, the problem
| sound political rather than technical.
|
| There's the x62 speed from the article, but there's no info on
| what features are supported. I feel like this is something where
| the devil is in the details.
|
| I'm happy being proven wrong on that subject, I fear that we're
| going increase the number of tools in the frontend ecosystem for
| no good reasons, and make every a bit harder again.
| n0w wrote:
| I had the same thought.
|
| A direct port of tsc to a "faster" language doesn't guarantee a
| faster type checker. Sure it might be a bit faster, but my
| guess is that the large performance wins come from a different
| design that takes performance into consideration from the
| start.
|
| Maybe the thought process is that a 1:1 rewrite will handle
| compatibility and feature completeness? I suppose you could
| optimise from there?
| scottlamb wrote:
| The "why not port to Rust" part boils down to these paragraphs:
|
| > tsc uses a lot of shared mutability and many parts depend on
| garbage collection. Even though I'm an advocate and believer in
| Rust, it doesn't feel like the right tool for the job here. Using
| Rust required me to use unsafe too much for this specific
| project.
|
| > tsc depends on shared mutability and has a cyclical mutable
| reference. Rust is designed to prevent this behavior. Having two
| references to the same data (shared mutability) is undefined
| behavior in Rust.
|
| I'd be curious to hear more. In particular:
|
| * Two references to the same data: were cells too awkward?
|
| * Actual garbage collection: my first instinct would be to use an
| arena for each compilation, destroying the whole thing at once
| rather than doing reference counting or garbage collection. I
| gather that's not cool in the age of language servers and
| incremental compilation, though. Does tsc support those?
| yazaddaruvala wrote:
| All of that is valid for a rewrite, but not efficient if you're
| porting the current implementation.
|
| As I understand it, in this case, Rust would be able to have a
| line to line port from JavaScript.
|
| I do still think your overall point holds. Why not use a
| library in the Rust ecosystem to help with this problem e.g.
| some GC<T> type.
| scottlamb wrote:
| > All of that is valid for a rewrite, but not efficient if
| you're porting the current implementation.
|
| Help me see it?
|
| I can easily see how getting rid of the cyclical structure
| crosses from "line-to-line port" into "rewrite" territory.
| I'm not saying you're wrong, but I'm having more trouble
| seeing how "wrap some types in cells and pass along a
| lifetime and arena parameter" crosses that threshold. It
| might not be feasible at all, though, if tsc compilations are
| long-lived for language server + incremental compilation
| support.
|
| > Why not use a library in the Rust ecosystem to help with
| this problem e.g. some GC<T> type.
|
| iirc I've seen some experimental GC library mentioned. I
| think something like that is at least off the beaten path
| enough that I can see how the author might decide it's better
| to just use Go instead.
| andreypopp wrote:
| Well... How many paid work hours spent on tsc? Then does
| TypeScript have a formal language specification? I'm pessimistic
| it's possible to rewrite tsc with full compat (bug-to-bug,
| otherwise it doesn't really make much sense) without MS
| support... but who knows... I'd like fast compiles too, so good
| luck!
| paavohtl wrote:
| A formal specification would be basically useless, if nothing
| guarantees that the reference implementation actually follows
| it. Sharing & contributing to the same test suite is a much
| more practical approach.
| andreypopp wrote:
| Specification isn't being followed is of course useless. But
| yeah, tests are definitely useless and in the absence of the
| spec is the only sane way to approach such rewrite.
| quickthrower2 wrote:
| As a smoke you can swap it in for a lot of the npm ecosystem
| and run the tests of those projects on the compiled code.
| silasb wrote:
| I'm surprised to see no one comment on Zig. It seems like the
| bun.sh developers are making some bold changes with such a low-
| level language.
| kbd wrote:
| I'm super bullish on Zig, but it's a bit early to bank on for
| production things as the language is still changing/being
| developed.
|
| To give another example of where this came up: they were
| discussing using Zig for the Ruby JIT, but despite really
| liking Zig the team decided to go with Rust because of its
| maturity: https://bugs.ruby-lang.org/issues/18481#note-21
| newlisp wrote:
| I'm more curious about if the author research Ocaml as an
| alternative.
| forty wrote:
| go means it's going to be easier to integrate to esbuild :)
|
| I'm happy to see this, as honestly I don't really understand why
| you would compile TS without type checking.
| apazzolini wrote:
| We use SWC with ts-node to strip types in dev for faster
| iteration loops and run the full tsc typechecker in CI (and
| optionally as a separate process on your local machine).
| forty wrote:
| Answering to myself, a colleague suggestion (from tsc doc
| actually): use tsc --noEmit (for type checking) and a fast
| compiler.
|
| tsc --noEmit duration is 2/3 of a normal build on our fairly
| large code base, so that can definitely speeds things up (for
| CI for example)
| eatonphil wrote:
| > I'm happy to see this, as honestly I don't really understand
| why you would compile TS without type checking.
|
| I don't think I'm remotely alone in doing this but I'll share
| from my perspective. I have two modes while writing TypeScript
| stuff: 1) prototyping something where I don't want to spend
| time writing types (or maybe I write some types but I don't
| care if they are correct or line up yet) but I still want to
| have JavaScript generated so I can test the thing out manually
| and 2) getting types correct as part of long-term maintenance
| and edge-case removal.
|
| There are fast TypeScript -> JavaScript generators today like
| swc and esbuild. tsc is not fast. So I can speed up "1)" from
| above if I use swc and esbuild and only use tsc for "2)" from
| above.
|
| As people write faster implementations of tsc-the-type-checker
| though I'll be happy to use that instead and just not use tsc
| anymore.
| tshaddox wrote:
| Out of curiosity, how often do you have a project in the
| first mode that gets large enough that the speed of compiling
| TS to JS is significant?
| eatonphil wrote:
| There is a huge difference even in small and mid-size
| projects.
|
| https://datastation.multiprocess.io/blog/2021-11-13-benchma
| r...
| nicoburns wrote:
| If you're writing UI code then even a few seconds is slow.
| You don't want to be waiting 5 seconds every time you move
| something a couple of pixels to the left.
| m-hilgendorf wrote:
| > 1) prototyping something where I don't want to spend time
| writing types
|
| It doesn't compile to JS, but this is one of the canonical
| use cases for Stanza 's (1) "optional" type system
|
| (1) http://lbstanza.org/optional_typing.html
| mixedCase wrote:
| Have you tried doing Type-Driven Development? IME, most of
| the time constraints are determined through types first with
| the code being done later tends to replace the game of "what
| kind of complex and non-obvious types fit my code the most"
| with a "oh, the implementation is obvious" reaction.
|
| It also makes it easier to determine what behavior is worth
| testing and which ones are already covered by your types, if
| you tend to write tests last. If instead you're used to Test-
| Driven Development you can do types->tests->code.
| eatonphil wrote:
| I've spent a lot of time programming in Standard ML and
| yeah that's a prerequisite there.
|
| But no I don't prefer it to what I do in TypeScript.
| forty wrote:
| That makes sense, I would generally use JS directly for mode
| 1), but I understand the use case.
| mattkrick wrote:
| Personally, I let vscode do typechecking on open files & have a
| pre-commit git hook to typecheck changed files.
|
| When it comes to starting up a development server & building
| the client, there's a huge cost to repeatedly typechecking the
| same 1000+ files. By cutting out typechecking & only compiling,
| I can reduce the webpack client build from 40 seconds to 3
| seconds (using sucrase).
| forty wrote:
| I'm answering to you but this is really for most siblings:
|
| I maintain a fairly large TS code base (nodejs backend). I
| think a full rebuild is 1:30 on my relatively recent laptop
| (i7-8650U, lots of RAM). But in practice I always use tsc -w
| and compile is mostly instant after editing a file (I do have
| to wait whenever I launch the command after I boot, but after
| that, it's fast enough).
|
| tsc now support incremental compilation too, though I haven't
| played with it too much as I'm happy with watch mode.
| rezonant wrote:
| Incremental compilation is _okay_ but it seems to perform
| full rebuilds if the previous build is _too_ old which is
| weird- meaning that more builds than expected take the
| entire time to complete. Nonetheless I use it for all of my
| builds. On the other hand this may just be because of the
| spiraling nature of type inference, where TS needs to re-
| examine a large amount of the program from a change that
| appears simple.
|
| Personally I only have a small number of projects that take
| more than 10-20 seconds to compile, but those ones are
| painful. I should probably do the same with -w for those.
| morelisp wrote:
| What if not everyone else on your team is paying attention to
| their editor's warnings? What if the LS's async response
| doesn't come fast enough? What if, god forbid, you have to
| change something on a different computer?
|
| If it's not in your CI/CD scripts, it's not actually getting
| checked.
|
| > I can reduce the webpack client build... to 3 seconds
|
| This is about 10x longer than any interaction with a computer
| should be.
| tdfirth wrote:
| This is similar to what I do (although I use esbuild),
| however like an idiot I just run tsc manually so obviously I
| forget sometimes and it takes 10 minutes to realise the build
| failed.
|
| Not great... I'm ashamed but it's just me on this project
| atm.
|
| edit: misread parent originally
| bastawhiz wrote:
| If it takes one minute to type check a codebase and ten seconds
| to compile it to JavaScript, then whenever I simply need to
| produce a JavaScript file without type checking I'll save
| myself the fifty seconds. If my editor's TS language server
| does the checking and my bundler is just compiling for my dev
| environment, I don't want both tools duplicating each other's
| work.
| postalrat wrote:
| You compile ts without type checking when the options to do so
| are 10x+ faster than with type checking.
| pjmlp wrote:
| I really don't get this, when I save my TS files they are
| already compiled into JS on the fly, ready to fed into a
| <script />.
| recursive wrote:
| By what? What if they were updated from an upstream repo,
| and never saved from an editor? What if you change branches
| without an upstream repo?
| pjmlp wrote:
| By VS and VSCode, the IDE/editor that belongs to the same
| company that does TypesScript.
|
| If I really need to call tsc explicitly, the few seconds
| that I have to wait, versus the minutes on some other
| languages, won't make me lose sleep over it.
| antihero wrote:
| It will be interesting to see if they create a language server so
| we can use it in VSCode.
| politician wrote:
| I'm sorrowful that such a significant percentage of this post was
| a preemptive apology for not using Rust.
| newlisp wrote:
| Well this a blow for the "Rewrite it in Rust" movement.
| davepeck wrote:
| (In a different direction, Microsoft's pyright type checker for
| Python -- IMO the fastest and most correct type checker in the
| Python ecosystem -- is implemented in TypeScript!)
| zem wrote:
| as someone who works on a competing typechecker, i'm definitely
| very impressed by pyright, especially since it has a single
| author! we have three people working on pytype and it
| definitely feels like too little; the way the pyright author
| manages such a great development velocity is mindblowing, and
| means they got the initial architecture very right.
| jupp0r wrote:
| > tsc depends on shared mutability and has a cyclical mutable
| reference. Rust is designed to prevent this behavior. Having two
| references to the same data (shared mutability) is undefined
| behavior in Rust.
|
| This problem has numerous solutions in Rust - (A)Rc, RefCell etc.
| I don't understand how this caused the author of the article to
| stop considering Rust.
| petmon wrote:
| Non-GC languages are bad in general at handling cyclical
| references (mutable or not), and Rust is especially ill suited
| for it because of the borrow checker. For example should an
| arbitrary reference in tsc become an rc::Rc or an rc::Weak? You
| have to tease out the implicit ownership model and one may not
| even exist.
|
| tsc was designed expecting GC and it makes perfect sense to
| port it to another fast GC language.
| armchairhacker wrote:
| I honestly like the explicit weak ref model and it's usually
| trivial (child objects are strong references, parents and
| siblings are weak), and rust has Gc as a last resort.
|
| But if the author's more familiar with Go, performance in Go
| is likely good enough
| throw10920 wrote:
| I'm not a Rust fanboy (in fact, I rather despise the RESF,
| especially given Rust's mediocre tooling), but: I think that
| "You have to tease out the implicit ownership model and one
| may not even exist" suggests a problem with the _program
| itself_.
|
| I can't think of any reason why a program _must_ have an ill-
| formed ownership model (unlike, say, the fact that many
| interesting programs must have an embedded dynamic type
| system (tagged unions etc) to do their thing).
|
| Now. The Rust language and existing tooling doesn't give you
| any help in figuring out the ownership model, but I can
| easily imagine a language and tools that _would_ help with
| that.
|
| This isn't meant to refute anything you've said (you said
| "[existing] non-GC languages are bad in general" not that
| non-GC languages _cannot_ be good) - just add clarity.
| bcrosby95 wrote:
| A program doesn't have to have an ill-formed ownership
| model. But if it does and you're trying to translate from
| one language to another, you should probably pick the
| language that makes the job easier.
|
| It reminds me of a programming assignment I had 20 years
| ago. We had to take our teacher's old FORTRAN program and
| "convert" it to Java or C++. I knew both equally well at
| the time - and I picked C++ because it had goto whereas
| Java didn't.
|
| This meant that I didn't really have to unravel the meaning
| of the program. Whereas with Java, I would have to.
| [deleted]
| adamnemecek wrote:
| > I rather despise the RESF, especially given Rust's
| mediocre tooling
|
| Which tooling are you talking about?
| chubot wrote:
| If your program is an interpreter, then I think the
| ownership model is going to be dominated by the program
| being interpreted, not the intepreter itself... That is,
| has to be general enough to interpret any program, and many
| of those programs will have cyclic data structures.
|
| TSC is not an interpreter, but its type system is Turing
| complete, so I wouldn't be surprised if it has some similar
| issues.
| klodolph wrote:
| Speaking as someone who has written non-trivial compilers
| in both Rust and Go...
|
| Compilers tend to use a bit thornier data structures and
| graphs than you might see in other realms. It is _very_
| convenient to just represent these graphs as objects
| connected by pointers, and know that you don 't have to
| worry about objects that become unreachable.
|
| For example, the values in an SSA-based IR are all full of
| pointers to other values. Each value represents the
| computation to produce that value, like "x12 / 4". A
| strength-reduction pass goes in and mutates this node,
| replacing it with "x12 >> 2" (if such a change is valid).
| It knows the change is valid because it follows pointers
| from the node to its inputs. The change is available to all
| upstream values immediately because it is mutated. These
| changes may make other values unreachable, at which point,
| you don't care about them. You only care about the values
| which are reachable from computations with side effects.
| These nodes _definitely_ have cycles and IMO, the notion of
| weak references between them doesn 't make sense.
|
| I'm sure there's a way to do that in Rust. Please, nobody
| try to explain how. I could figure it out. The thing is...
| it's not obvious how to do it in Rust, and figuring out how
| to "explain ownership to the Rust compiler" without
| creating Rc<RefCell<T>> cycle leaks is a non-trivial task
| that sucks away time I could be spending writing the
| compiler.
|
| This reminds me of a paper I saw go by years back about
| writing a compiler in Haskell. Pure Haskell code doesn't
| let you mutate objects and doesn't let you do pointer
| comparisons, so it creates problems if you're trying to
| implement an algorithm which is conceptually pure (at a
| high level, a function with inputs and outputs) but where
| you're taking the implementation from a paper which
| constructs the output iteratively, piece by piece. The
| problem is that your data structures have to be annotated
| to show which parts are mutable (IORef, STRef, or TVar),
| but "which parts are mutable" may have a different answer
| depending on whether you are inside some function
| transforming or constructing a data structure.
|
| It was obvious to me, reading the paper, that while I love
| functional purity, and functional purity solves all sorts
| of problems and makes it easier to write code _in general,_
| it sometimes gets in the way of getting shit done. The same
| is true of Rust 's notion of ownership... super useful when
| it lets me write fast, efficient code without having to
| think about ownership (when it just works and gets out of
| the way), and super annoying when you have to figure out
| how to explain your program to Rust.
| sealeck wrote:
| Rust does have a third-party `Gc` type (which used to be in
| the stdlib).
| speedgoose wrote:
| I think the point is valid. As a rust beginner I tried to make
| a graph data structure and it was difficult. Using RefCell is
| not very nice because it moves the problem from compile time to
| runtime and you still can't mutate the same data structure more
| than once at the same time.
| m-hilgendorf wrote:
| Rust encourages you to use an adjacent list/matrix for your
| graph representation, which is preferable almost all of the
| time for graph algorithms. It's a little unwieldy if the
| graph needs to be mutable during the core algorithms, but
| almost all algorithms in practice assume it will be stable.
| tomcam wrote:
| This is interesting but I don't know Rust. Presumably your
| graph data structure grows dynamically, meaning you have to
| add nodes at runtime. When. you say you can't mutate the same
| data structure more than once at runtime, what does it mean?
| That you can't set both a parent and a child pointer in the
| same function, or what?
| [deleted]
| [deleted]
| speedgoose wrote:
| I'm very far from being a rust expert but it was difficult
| to build a bi-directional graph because it's difficult to
| connect the nodes together. A simple recursive algorithm
| will not work. I ended up using a crate.
|
| I found the following article interesting on the topic :
| https://aminb.gitbooks.io/rust-for-c/content/graphs/
| jupp0r wrote:
| It shouldn't be difficult:
|
| 1. You have a graph struct that has ownership of all
| nodes (in Rc containers).
|
| 2. Nodes have Weak references to connected nodes
| slaymaker1907 wrote:
| While I hate Go and love Rust, there is also the issue that
| RefCell/Gc/Rc are very different in terms of how expensive a
| memory allocation is compared to node. Historically there were
| very similar problems in trying to write C++ like one would
| write Java. Memory allocators today are much better and closer
| to GC languages, but I suspect it will still be more expensive
| for very allocation heavy code.
|
| One way you could get this to work at the cost of some safety
| (depending on how you do it) would be to use arena style
| allocation. Obviously there are ways to do this safely in Rust,
| but it is probably easier to do it unsafely for a port.
|
| However, if you just use a high performance language with a GC
| like Go/C#/Java, the port will be easier and you won't have to
| worry about doing too many allocations.
| andyfleming wrote:
| Here's a relevant GitHub thread where some conversation that has
| been happening: https://github.com/swc-project/swc/issues/571
| hardwaregeek wrote:
| This reminds me of when people try to make a new Ruby
| implementation. They get the basics down and think "this isn't so
| bad! I can do this!". Fast forward a few months and they're stuck
| on some nasty little edge case around eval and method_missing
| that they need to run Rails. Or some weird aspect of C
| extensions.
|
| TypeScript's compiler is an unholy, unsound mess, because
| JavaScript is an unholy, unsound mess. Sure the basics of type
| checking structs and variables is somewhat easy. But what about
| conditional types? What about weird generic constraints? What
| about indexing types? There's some really confusing stuff going
| on underneath the hood.
|
| It's also a mess that *does not have a spec*. Even if you somehow
| manage to make a type checker that implements most of the
| features, there's no way to ensure it has 1 to 1 parity with
| TypeScript. Plus that parity can be broken at any point.
|
| The only way I could see this happening is if the author somehow
| convinced Anders Hejlsberg or a significant portion of the
| TypeScript team to work on this new implementation. I'm not sure
| that's gonna happen.
| spion wrote:
| The good thing is you have a ton of code on the Internet to use
| to test it. Unlike a dynamic language reimplementation, a
| static language transpiler can check if the results produced
| are identical to `tsc`
| scottlamb wrote:
| > This reminds me of when people try to make a new Ruby
| implementation. They get the basics down and think "this isn't
| so bad! I can do this!". Fast forward a few months and they're
| stuck on some nasty little edge case around eval and
| method_missing that they need to run Rails. Or some weird
| aspect of C extensions.
|
| I think that's exactly why the author wrote this:
|
| > Eventually, I realized that rewriting a massive project such
| as tsc would be extremely difficult to continue. But the
| performance gains were exciting. This led me to try another
| route: porting instead of a complete rewrite. So, I started
| looking at the TypeScript source code.
|
| It will require ongoing work to keep up with changes to
| upstream, but by matching upstream's structure as closely as
| possible (given the language difference), the path to
| supporting these nasty little edge cases should be relatively
| clear.
| mrkentutbabi wrote:
| What is the difference between porting and complete rewrite?
| I thought they are the same?
| scottlamb wrote:
| > What is the difference between porting and complete
| rewrite? I thought they are the same?
|
| As with so many terms in computing, there aren't
| universally agreed definitions. But in the author's post, I
| read "porting" as switching languages while changing as
| little as possible vs "complete rewrite" as
| redesigning/rearchitecting it along the way.
| travisd wrote:
| I think here "rewrite" means completely changing the
| internal structure of the code (while preserving the
| functionality) whereas "port" means copy the structure of
| the internal code with as little modification as possible.
| And it's hard to do the latter in Rust since Rust is not
| garbage collected.
| jacobr1 wrote:
| Porting can involve a spectrum of changes. If you are
| porting to a different language, often for bug-for-bug
| compatibility you try to maintain the same rough code and
| class/object structure as the source codebase. This might
| give you some non-idiomatic code at first, but once you
| have good test coverage you can refactor. In this case you
| might maintain some of the upstream code structure to
| better match future changes.
| ludamad wrote:
| Interesting, in theory you could make a rough js to golang
| compiler given the relatively strict subset of typescript
| that I recall the typescript compiler using (not that the
| strictness means a fully sound implementation is easy, but
| that you can use the regularity to cheat text transforms)
| danenania wrote:
| If they focus on the core features and make a significant
| enough improvement, perhaps the core TS team could be convinced
| to pull it in as a native extension and make calls to it for
| the operations it covers? If something like 62x improvement is
| really achievable, it's hard to imagine the team not being
| interested at all.
|
| I would personally be very happy to use a significantly slimmed
| down subset of TypeScript if it provided a 62x performance
| improvement. Slow type checking is currently the main drawback
| of TypeScript development in large projects.
| spion wrote:
| 62x improvement is not achievable. 6.2x might be.
|
| edit: I see they are running it with 8 threads, in which case
| yes 50x is achievable. In any sizable codebase though, you
| should probably split into modules and run with some
| parallelism via e.g. `wsrun` or `turborepo`
| Androider wrote:
| esbuild is > 125x faster than Webpack, doing the exact same
| job. It's not some theoretical microbenchmark, you'll see
| it when converting a project from one to the other. If a
| software hasn't been designed with speed in mind, and I
| don't think tsc has, there can be massive performance
| improvements possible if you create a new implementation
| with performance as a top goal.
| spion wrote:
| Webpack does way more than esbuild, including running a
| typechecking compiler instead of just transpiling,
| running compilers able to downlevel emit to ES5 and
| providing a deep plugin architecture allowing you to hook
| into any bit you like. But yes, it hasn't been designed
| with speed in mind - it has been designed for maximum
| extensibility instead. Its the same reason why Babel is
| slow compared to sucrase (written in JS, currently faster
| than SWC and esbuild but doing somewhat less -
| https://github.com/alangpierce/sucrase)
|
| tsc has in fact been designed with speed in mind (I've
| been following the project since before it moved to
| GitHub, which is when they redesigned it for speed).
| Going beyond 1 order of magnitude performance improvement
| while implementing the full feature set is highly
| unlikely.
| epolanski wrote:
| One of the main requirements of TypeScript is that the
| compiler has to work in a browser. The monaco editor is the
| prince jewel of TypeScript applications.
|
| So the only possibility I see is a webassembly alternative.
| hardwaregeek wrote:
| I'm not saying this is impossible but I think you'd have to
| work rather carefully to not have to parse the code twice and
| effectively run two type checkers in parallel. Which, on
| second though, you could run the fast one then the slow one.
| Perhaps that's an option.
|
| > I would personally be very happy to use a significantly
| slimmed down subset of TypeScript if it provided a 62x
| performance improvement. Slow type checking is currently the
| main drawback of TypeScript development in large projects.
|
| We all think that until the slimmed down subset doesn't work
| with our favorite libraries. Sure, there may be people who
| use no dependencies, but I'd bet that there's sophisticated
| types in some very popular libraries that even low dependency
| programmers use.
| travisd wrote:
| To this point, many teams are already using subsets of
| TypeScript to improve compilation times. Often times it's
| things like always declaring return types on functions to
| avoid the compiler having to infer it (especially between
| module boundaries). Calling this a "subset of typescript"
| might be strong wording, but it does suggest there is some
| desire in that general area.
| vosper wrote:
| > To this point, many teams are already using subsets of
| TypeScript to improve compilation times. Often times it's
| things like always declaring return types on functions to
| avoid the compiler having to infer it (especially between
| module boundaries).
|
| I hadn't heard of this trick - how much improvement does it
| make? Seems like it might be good for readability, too?
| travisd wrote:
| I haven't personally benchmarked it, but I know it's
| available as an eslint rule (all exported functions must
| have declared return types).
| fault1 wrote:
| It seems like depending on how you configure typescript
| (e.g, in tsconfig), typescript is already something like an
| ensemble of mini-languages with different dialects and
| semantics. Much more so than other languages. But I agree,
| restricted typescript that has to do less work (data or
| control flow analysis) would probably be on the orders of
| magnitude faster.
| Rowern wrote:
| The issue here is that if you're running in the JS ecosystem
| you'll definitely want to use other people code (npm package
| or internal lib), if the subset breaks JS compatibility then
| you can break a significant amount of code without realising,
| if it is "only" a TS subset, then you need to make sure that
| each lib/package you import are compatible. Anyway this does
| not seem like a good solution.
| shados wrote:
| to be fair, even within the TypeScript world that can be a
| problem. If typescript versions don't line up, or if your
| various strict options are too strict, you can get into
| really weird situations. It -generally- works, but when it
| doesn't, it's hell.
| koolba wrote:
| > I would personally be very happy to use a significantly
| slimmed down subset of TypeScript if it provided a 62x
| performance improvement. Slow type checking is currently the
| main drawback of TypeScript development in large projects.
|
| Isn't that already available by running in "transpile only"
| and having the type checking happen in the background?
| mcintyre1994 wrote:
| > I would personally be very happy to use a significantly
| slimmed down subset of TypeScript if it provided a 62x
| performance improvement.
|
| I'm just guessing but I think a lot of Typescript's really
| complex stuff is to support third party libraries with really
| complex types - at least they often refer to libraries
| they've improved support for when they discuss complex
| seeming additions. So I wonder how much of the JS ecosystem
| you'd have to cut off to get to such a subset. A particular
| example would be that I've often seen a chain of 20+
| dependencies leading to Lodash, and I'd guess that Lodash is
| really complex to type. If you lose everything that's
| eventually dependent on something that's really complex to
| type, do you have much of NPM left?
| acemarke wrote:
| As a maintainer for Redux, Redux Toolkit, and React-Redux,
| I can tell you that we would _love_ to have several
| additional features in TS that would make our library TS
| types much easier to maintain. Like, say, partially
| specified generics.
|
| Tanner Linsley (author of React Table, React Query, and
| other similar libs) has expressed similar concerns.
|
| (I'm actually starting to work on a talk for the upcoming
| TS Congress conference about "lessons learned maintaining
| TS libs", and this will be one of the things I point to.)
| armchairhacker wrote:
| Even if it's not official i could use a slimmed-down, faster
| TypeScript. Even if it has slightly different semantics. I
| always use skipLibCheck so TS only checks my code.
|
| It's a form of linting, not semantics, so it doesn't have to
| be exact.
| Waterluvian wrote:
| I am absolutely blown away at how good TypeScript's type
| checker is, given how much of an unholy mess JavaScript is, and
| that the TypeScript language cannot just deprecate all the
| awful parts.
|
| More than once I've yelled at my screen, "HOW DO YOU KNOW
| THIS?!" about some inferred or narrowed type.
| dmitriid wrote:
| However, more than once I've also yelled "HOW DO YOU NOT KNOW
| THIS?!" at something or other :)
| epolanski wrote:
| Just today I was triggered that this compiles.
|
| declare foo: never[] // it can only be an empty array
|
| foo[1] // compiles
|
| TypeScript is a nice type system, it's surprisingly
| expressive and flexible, like e.g. in Haskell I was
| surprised that there's this strange notion that for a given
| type you can only have one `Eq` or `Ord` instance for some
| type A.
|
| But in TypeScript you can have as many Ord interfaces as
| you want. I may want to order some `User` type by last
| login, first login or whatever I want, but in Haskell you
| need to overcomplicate it and create a newtype.
| Jenk wrote:
| I wouldn't expect never[] to mean 0 length array, tbf.
| It's a variable length array of _type_ never. It should
| be something akin to: declare foo:
| never[0];
| emteycz wrote:
| I think the correct solution is:
| declare foo: []; // fixed-length array (tuple) without
| any items
| armchairhacker wrote:
| foo[1] is undefined (literally the undefined symbol), in
| JavaScript index-out-of-bounds is ok.
|
| Although i don't know if TypeScript asserts that x[i] is
| T | undefined if x is T[] and the length is non-trivial.
| In strict mode this would be really useful as you can
| just do x[i]!, but it requires you to think a bit about
| index-out-of-bounds or at least throw on the undefined
| case.
| endgame wrote:
| > you can only have one `Eq` or `Ord` instance for some
| type A.
|
| This is a good thing - it stops you from using one Ord
| instance when putting As into a Map or Set, and then
| trying to pull them back out using another. Newtypes let
| you provide "additional" instances in a safe way.
| reuben364 wrote:
| I've managed to get the compiler to typecheck
| const x: Thing = exprA; const y: Thing = exprB;
| const z = [x, y];
|
| but somehow not: const z: Thing[] = [exprA,
| exprB];
|
| although exprA and exprB were pretty nasty typing wise.
| lhorie wrote:
| Yeah, the lack of specification for Typescript is quickly
| turning into the Achilles heel for the web tooling ecosystem.
| Several projects got the parsing part done, but a lack of
| specification and the relatively fast release cycle means it
| isn't really feasible to invest into actually implementing the
| meat of the TS compiler, so everyone is stuck with that last
| remnant of critical tooling running slowly.
| pcj-github wrote:
| No spec, but the golden files that check that the compiler is
| working as intended are very comprehensive. If you can generate
| the same ~38,000 files in this directory from the test cases,
| you're pretty much on par with tsc:
| https://github.com/microsoft/TypeScript/tree/main/tests/base...
| jacobr1 wrote:
| But that only works for a bug-for-bug compiler, no? If you
| want to build something like a linter, you couldn't use them
| to validate you are covering the fully syntactic range.
| Though I guess it would help look at coverage if your linter
| threw any errors, but it wouldn't validate correctness.
| msoad wrote:
| There is no specs so TypeScript's own impelemtption might not
| be accurate to what the team imagined on how things should
| work.
|
| That means if this implementation passes all TypeScript's
| tests, it is as good as TypeScript.
|
| Each new issue that is opened in either repo that shows
| discrepancy can be a new unit test added to the TypeScript
| codebase.
| scotty79 wrote:
| Good thing is that the only thing it does is type-checking. So
| if you miss some edge case it won't influence at all how your
| ts code works.
|
| The only thing is it may not catch some type error, which JS
| already does none of and TS can skip wherever you want or even
| implicitly and people are fine with that.
|
| If I get 98% of TS typechecking 10 times faster. It's a huge
| win.
| simplify wrote:
| You're vastly underestimating the advanced type system
| features TypeScript requires to typecheck even the most basic
| JavaScript code. It would be more like 10% rather than 98%.
| Intersection and union types are no joke, and are notoriously
| difficult to make performant.
| skybrian wrote:
| One way to collaborate would be to write a spec as you go,
| along with an associated test suite that can be run against
| multiple implementations.
|
| Presumably TypeScript has a test suite already that might serve
| as a starting point?
| newlisp wrote:
| But in the meantime he gets to work on a fun project, learn a
| lot and get paid for it at the same time ;)
| adamnemecek wrote:
| The way of dealing with cycles is generational arenas. Also where
| is the codebase? I would like to see the uses of unsafe.
| danenania wrote:
| If anyone happens to know: what are tsc's bottlenecks?
|
| I presume that if the bottleneck was just number of loop
| iterations, V8 would already be about as fast as C or any other
| language?
|
| Is it simply that accessing/mutating javascript data structures
| like objects, arrays, set, etc. is slow? Are those not fully
| optimizable by V8 for some reason? Or is it something else like
| the GC being slow?
| [deleted]
| cpcallen wrote:
| Yes, I'm curious about this too.
|
| Given that gopherjs[1] can compile Go to JavaScript and in some
| cases thereby achieve _better_ performance than the native Go
| compiler[2], it 's not obvious to me that a JavaScript program
| ported mechanically to Go is likely to be any faster than the
| original. It seems to me that the effort might be better
| applied to optimising the performance of tsc itself.
|
| [1] https://github.com/gopherjs/gopherjs [2]
| https://medium.com/gopherjs/surprises-in-gopherjs-performanc...
| newlisp wrote:
| This is a problem that can take advantage of good parallelism,
| Go will give you that and at the same time be more memory
| efficient.
| epolanski wrote:
| I'm not so positive about these projects.
|
| They are good willed but TypeScript doesn't have an official
| spec, it's an implementation-defined language. It's not C, it's
| not JavaScript, it's whatever each release wants it to be. This
| makes it very difficult to follow every version and evolution,
| you're consistently chasing whatever has been merged on release.
| folkhack wrote:
| And it's the exact reason I'm really reluctant to fully buy
| into Typescript... I always feel like I'm wrong about the "not
| an official spec" part of things but you are 100% correct in my
| understanding.
|
| Also on "it's whatever each release wants it to be" - that is
| hell on earth if you're looking to target consistent software
| processes and it only gets worse the more you scale (both
| solution and team).
|
| I get the value of Typescript - hell I've used it extensively
| myself... it's just that I can't stop feeling like this is all
| an anti-pattern: trying to make a loosely-typed language a
| typed language (if you can call it that?). =/
| BadOakOx wrote:
| Just from a week ago on HN:
|
| I'm making a Typescript type-checker in Rust:
| https://news.ycombinator.com/item?id=30033119
| epolanski wrote:
| Next week it will be Zig then
| slig wrote:
| Cannot wait for their webpack replacement.
| pjmlp wrote:
| The thing with all those projects is playing catch-up with
| upstream, and the possible variations from what is actually the
| official semantics depending on the TypeScript version.
| throwaway894345 wrote:
| Personally I wouldn't fret about keeping pace with upstream.
| Eventually the syntax will stabilize if it hasn't already. I'd
| be more worried about fidelity--does this produce the same
| result as the reference implementation? TFA says he's running
| against the "conformance" test suite, so that should keep
| things pretty tight and when bugs are found they'll be fixed in
| time and presumably the test suite will be updated accordingly.
| wwwigham wrote:
| The `conformance` suite is the "set of tests people added
| when a feature was first released", usually without the
| decade of following regression tests in the `compiler` suite.
| In TS itself, the `conformance` and `compiler` suite are
| considered the part of the same unit most of the time. (The
| compiler suite, however, is not helpfully subdivided by
| feature, since it oft deals with cross-cutting concerns!)
___________________________________________________________________
(page generated 2022-01-25 23:00 UTC)