hngopher.com

       [HN Gopher] Memory-efficient enum arrays in Zig
       ___________________________________________________________________
        
       Memory-efficient enum arrays in Zig
        
       Author : dist1ll
       Score  : 297 points
       Date   : 2023-09-18 12:00 UTC (11 hours ago)
        
 (HTM) web link (alic.dev)
 (TXT) w3m dump (alic.dev)
        
       | cmrdporcupine wrote:
       | In general I'm maybe somewhat disappointed that pattern matching
       | in Rust can't be expressed more as a type system trait-type thing
       | (hand waving here) that any struct could conform to, rather than
       | an explicit first class hardcoded object type with its own
       | storage structuring, etc?
       | 
       | I may not be expressing this cogently, but. I've implemented both
       | an AST (as in this article) and an opcode/bytecode interpreter
       | recently, and I feel that in large part Rust's enums are not
       | ideal for either:
       | 
       | I never had the memory usage concerns about the AST that this
       | author did, but I wanted to do things like: add a line number /
       | column attribute to every statement node. I had a `Stmt` enum,
       | and I had a choice: put line # and column on every enum
       | possibility (boilerplate and ugly), or put the enum in a new Stmt
       | struct that contains both the original enum and the line/col
       | attributes -- which involved a bunch of refactoring and also
       | didn't feel elegant. I feel there must be a more elegant type
       | system way that my "Stmt" struct could have been reworked, that
       | Rust isn't offering me.
       | 
       | And re: the opcode stuff,I am fairly sure that a pattern-matched
       | Rust enum is not an ideal encoding for a VM opcode interpreter
       | performance wise. But the language really does want to point you
       | in this direction, and the facilities for de-structuring patterns
       | are very seductive. I haven't gotten to yak shaving this yet, but
       | again I feel like there's likely some potential for improvements
       | in the type system that could open things up so one could get
       | pattern matching facilities while using one's own preferred lower
       | level impl.
       | 
       | I dunno. Thoughts.
        
         | aw1621107 wrote:
         | > pattern matching in Rust can't be expressed more as a type
         | system trait-type thing (hand waving here) that any struct
         | could conform to, rather than an explicit first class hardcoded
         | object type with its own storage structuring, etc?
         | 
         | Do you think you can provide a more concrete example? First
         | thing that comes to mind for me is some kind of structural
         | typing [0], but I'm not confident I'm interpreting your comment
         | correctly.
         | 
         | [0]: https://en.wikipedia.org/wiki/Structural_type_system
        
           | cmrdporcupine wrote:
           | Other people who are better with these things could probably
           | conceptualize it better than me.
           | 
           | What I'd like to be able to do is declare a trait that says
           | "this thing can yield the following patterns, can destructure
           | in this pattern, and these are the functions you'd call to
           | extract references to those values"
           | 
           | And then be able to use match on it as usual.
           | 
           | Putting it another way, I wish an algebraic data type "enum"
           | was something any struct could "be", with the type system
           | providing the means to express that that's the case, and the
           | programmer being able to provide the "how" and with existing
           | Rust enums merely being the default way of doing this.
        
             | dan-robertson wrote:
             | Haskell and F# have this feature. In F# it is called
             | _active patterns_. In Haskell the feature is a _pattern
             | synonyms_ (maybe combined with view patterns) and is used
             | to e.g. pattern match on the prefix or suffix of a
             | sequence-like structure, or to match a tree as a an element
             | and left /right node without the caller seeing a bunch of
             | other tree metadata.
        
             | aw1621107 wrote:
             | Interesting idea. I feel a bit conflicted about it, but
             | it's probably something I'd have to think about.
             | 
             | I suppose it's possible to somewhat approximate this in
             | existing Rust by creating a trait with a function that
             | returns a destructure-able type, performing the
             | transformations that you would have implemented in your
             | hypothetical trait, then pattern match over that. Not sure
             | if that would be "good enough".
        
         | patrec wrote:
         | Several languages have what I think you are asking for. Have a
         | look at Scala's extractors or F#'s active views.
        
           | foderking wrote:
           | you mean f# active patterns?
        
       | [deleted]
        
       | eximius wrote:
       | I do with proc macros could evolve to be able to query the
       | compiler for information (which would necessitate careful thought
       | to adding stages to compiling). I've wanted to know "does this
       | struct impl this trait" or "give me a list of all concretely
       | implemented traits" or various things like this that would be
       | very, very useful in a proc macro.
        
         | Arnavion wrote:
         | IIRC the compiler runs plugins in two stages. The first stage
         | gets the AST before any typechecking and it can make
         | modifications to the AST; macros and some clippy lints run in
         | this stage. The second stage runs after typechecking so it gets
         | type information, but it cannot make any modifications; other
         | clippy lints run in this stage.
        
       | npigrounet wrote:
       | [dead]
        
       | airstrike wrote:
       | I only partially understand this article but this seems
       | incredibly relevant to me since I'm planning on writing a
       | spreadsheet engine in rust. Cell values require me to have
       | something like (this very poorly written POC code, sorry):
       | pub enum Expr {             Number(i32),
       | String(String),             Reference(Position),
       | Binary(Box<Expr>, char, Box<Expr>),             Function(String,
       | Vec<Expr>),          }
       | 
       | I'll keep reading and studying! If anyone has more resources to
       | throw at me, I'm all ears
        
         | nhatcher wrote:
         | Hey, I've actually build a spreadsheet engine in Rust. Not open
         | sourced, but could give you some pointers. You are going to hit
         | lots of performance issues before you can have some gains using
         | the methods of the article. The single most difficult problem
         | is the evaluation strategy.
        
           | airstrike wrote:
           | Hey, thanks for the offer. I have been thinking about
           | evaluation and the extent of my strategy so far has been to
           | consider it as a dependency graph, which led me to this
           | interesting article:
           | https://hyperformula.handsontable.com/guide/dependency-
           | graph...
           | 
           | Also it sounds like the right next step for me is to read
           | more on incremental computation. Someone on reddit just
           | pointed me to adapton:
           | https://docs.rs/adapton/latest/adapton/
           | 
           | Actually implementing anything with Rust is on my TODO list
           | (I'm spending some time on the UI prototype right now)
           | 
           | I will give a couple things a try and then, if you don't
           | mind, I can ping you via email. I was checking out your
           | website and loved the read on the parser -- not to mention A
           | MAZE (played all the way to level 4!)
        
             | dan-robertson wrote:
             | (just jumping in here) You may want to take a look at
             | https://www.scattered-thoughts.net/writing/an-opinionated-
             | ma... which describes things on the spreadsheety and not-
             | very-spreadsheety ends of the spectrum (moreso the latter).
             | If you follow some of the links there you should find some
             | good descriptions of various implementation strategies,
             | e.g. Naiad style timely dataflow or incremental-style
             | pseudotime.
        
             | nhatcher wrote:
             | Wow! Glad you had a look around. Yes, please send me an
             | email whenever you feel like.
             | 
             | And yes, you are going to need some sort of "support graph"
             | and topological sorting. But this is way more complicated
             | than it looks like. Most functions in a spreadsheet don't
             | know it's dependencies until you are actually computing
             | them, like `LOOKUP` or `IF`. Some functions like
             | `RANDBETWEEN` or `NOW` change their value every time. In
             | some modern spreadsheets some functions like `SORT` or
             | `UNIQUE` spill their values to some other cells.
             | Calculating dependencies correctly in all those situations
             | can be challenging.
             | 
             | There is not a lot of literature on spreadsheet
             | implementation and reading code of existing open source
             | implementations is hard. And exception is Peter Sesoft:
             | 
             | https://mitpress.mit.edu/9780262526647/spreadsheet-
             | implement... and https://www.itu.dk/people/sestoft/funcalc/
             | 
             | My recommendation would be to first set the scope. Then
             | code manually the lexer and parser and drawn those two in
             | tests. Excel formulas can be quite complicated! There are
             | lots of fancy things you can do there like use arena
             | allocators and other optimizations. I would try to avoid
             | that on a first go.
             | 
             | Here is something that I have found extremely convenient.
             | Excel saves the cell formula AND the resulting value.
             | Meaning that to create a test case you just need to create
             | an Excel workbook (and a xlsx parser, of course).
             | 
             | Doing the UI for a spreadsheet is an equally challenging
             | problem.
             | 
             | Good luck!!!!
             | 
             | Oh, and remember this thread from a couple of days ago.
             | Might be inspiring:
             | https://news.ycombinator.com/item?id=37527720
        
         | hathawsh wrote:
         | Sounds like a fun project. If it's intended for ordinary users,
         | one thing you should expect them to do is add content to the 4
         | extreme corners of a sheet and see if the spreadsheet engine
         | falls over as a result. If you allow 1 million by 1 million
         | cells, and you store a "null" for every cell not filled in,
         | then the engine will run out of memory. That means you might
         | consider storing the cell content in a sparse manner. One way
         | to do that is with a hash map implementation such as
         | "hashbrown". Just a thought.
         | 
         | (To clarify why I'm bringing this up, the points in this
         | article are low level details that you probably don't need to
         | think about if you start with a hash map and thereby avoid
         | running into memory constraints initially.)
        
           | airstrike wrote:
           | Thanks, great point indeed. I am looking into this
           | https://github.com/rust-lang/hashbrown
           | 
           | The way I think about it -- rather naively, I suppose -- is
           | that I care more about the references cells make to each
           | other than the actual grid of cells displayed on a table. The
           | latter feels more like a "view" of the data than an actual
           | data structure?
           | 
           | This also seems to align with the relative priority of
           | (sorted from highest to lowest): figuring out the order of
           | evaluation, calculating those evaluations, and finally
           | displaying the results of the evaluation
        
         | vlovich123 wrote:
         | The problem being highlighted is that if you have widely
         | disparate sizes of the variants and/or a lot of these in an
         | array, your hurting performance because there's a lot of space
         | being wasted on padding.
         | 
         | A common technique that came from gaming is that if you have an
         | array of structs (AoS) (eg array<struct { health: int, ammo:
         | Ammo, ... }> to break it up into a struct of arrays (SoA):
         | struct Humans { healths: Vec<int>, ammo: Vec<ammo>, ... }. Then
         | the ith index into those vecs is the ith Human in the AoS
         | layout (parallel vecs here are for example, but it's not
         | optimally efficient because the bookkeeping of length and
         | capacity on a per field basis is wasteful as that info is
         | duplicate).
         | 
         | This is basically a similar idea except for enums and doing it
         | automatically in a way that Rust can't. It's possible the
         | problem is a bit overrated in terms of how big of a problem
         | this is. Also, in Rust you may be able to wrap the third party
         | type in your own and then the derive macro approach probably
         | can work.
         | 
         | Re your spreadsheet, just keep in mind of this as a possible
         | optimization and you have to first figure out if you're
         | building for speed (ie you've done this a few times and you
         | know this is probably a bottleneck) or experience (ie build for
         | simplicity and understanding)
        
           | airstrike wrote:
           | Thank you -- definitely building it for speed which is why
           | this felt topical. I really appreciate your explanation, it's
           | super clear.
        
           | rudedogg wrote:
           | On the subject of AoS, Zig has a cool data structure for
           | that: https://zig.news/kristoff/struct-of-arrays-soa-in-zig-
           | easy-i...
        
       | sdfghswe wrote:
       | More or less OT.
       | 
       | I keep hearing about both Zig and D as interesting C-like low
       | level languages with better satefies/ergonomics. I wonder if
       | someone who's familiar with both would like to discuss the main
       | differences...?
        
         | esjeon wrote:
         | D is a GC-based C++/Java-like OOP language. People used to call
         | it "a C++ done right", but it was never simple and didn't have
         | any clear edges against its competitors. TBH, one simply should
         | not build on top of C++ - it's a beast that no one can
         | comprehend, and it only keeps growing.
         | 
         | In case of Zig, I think it's a very good modern C alternative.
         | Zig can do what C can do, can interoperate w/ C (e.g. importing
         | C headers), and offers new features that are useful (comptime,
         | slices, error type). It's like Go, but without GC and a large
         | runtime.
        
           | sdfghswe wrote:
           | Oh many I was loving your description of Zig up until when
           | you said "it's like Go"...
        
             | Guvante wrote:
             | I haven't heard of Zig referred to as like Go except in the
             | concept of trying to replace C.
             | 
             | (Rust is more aiming for C++)
        
         | ksec wrote:
         | Well WalterBright could have answered that. But since HN has
         | started taking a piss out of WalterBright replying in every
         | single Zig ( or Rust ) threads he stopped commenting on most if
         | not all Programming languages topics. Sigh.
        
           | saagarjha wrote:
           | There are productive ways to comment on threads without being
           | a shill for your own thing.
        
         | Sharlin wrote:
         | Zig is C-like. D is C++-like. Zig has first-class compile-time
         | metaprogramming. D is a kitchen sink language; it has _many_
         | more features than even C++ does. Zig is minimalistic and
         | strictly imperative rather than multi-paradigm. D has a cult
         | following but hasn 't really taken off. Zig is a _relative_
         | newcomer trying to prove itself.
        
           | systems wrote:
           | i think the term "D has a cult following" is a bit harsh D's
           | community is small, that is it, nothing cult like, they dont
           | over defend D most of the community seem to be happy using
           | other languages etc ..
           | 
           | but i agree with D being a kitchen sink language, trying to
           | be many things, but never being the absolute best at
           | anything, which i think is why it never really took off
           | 
           | i am really hopeful for Zig, trying to be a more pragmatic C
           | replacement
        
             | Semaphor wrote:
             | > i think the term "D has a cult following" is a bit harsh
             | D's community is small, that is it, nothing cult like
             | 
             | I think "cult following" is used here like you would for
             | movies, it's not actually cult-like, it's more that it's a
             | rather small, close-knit community.
        
           | pron wrote:
           | > Zig is C-like
           | 
           | Only in the sense that the language is simple and small, but
           | C pays for that by being inexpressive (you can find families
           | of C++ programs for which an equivalent C program would
           | require exponentially more code). Zig, on the other hand, is
           | small and simple while being just as expressive as C++. It
           | achieves that through a remarkable observation: many
           | primitive features can be avoided with a powerful enough
           | compile-time computation mechanism with introspection.
           | 
           | Zig is so revolutionary in its design that it's hard to
           | compare it to anything else. Or, put another way, any
           | comparison is sure to miss out on some very foundational
           | piece. So yes, in some sense Zig is like C (simple, low-
           | level), but it's as expressive as C++. In another sense, Zig
           | is like Scheme (minimalistic elegance), and yet it feels
           | familiar and it does not rely on macros for its core power.
           | In other ways it's like Go (easy to learn, easy cross-
           | compilation), but it gives full control and is more flexible.
           | 
           | At this point, Zig is sui generis.
        
             | 1980phipsi wrote:
             | > Zig is so revolutionary in its design that it's hard to
             | compare it to anything else.
             | 
             | I mean, Zig has an interesting combination of
             | functionality, but I wouldn't say that the individual
             | pieces are revolutionary. Who knows though, the whole could
             | be more revolutionary than the sum of it's parts...
        
               | ngrilly wrote:
               | Zig's take on comptime is quite new (which is the main
               | point of the article being discussed). Error handling and
               | async/await as well.
        
               | 1980phipsi wrote:
               | Zig's take on comptime is slightly different from C++ and
               | D, but that doesn't make it revolutionary. It has first
               | class types, which allows you to pass them to a function
               | along with normal function parameters, instead of having
               | a separate syntax for it like C++ and D. But it's not
               | really new.
        
               | pron wrote:
               | That's a little like saying that smartphones weren't
               | really new because we had mobile phones and calculators
               | and cameras and video cameras and handheld games before.
               | 
               | Zig's comptime is revolutionary and really new in that it
               | replaces many other separate features and is the basis of
               | Zig's minimalism. Figuring out how to discard features is
               | much more difficult than adding them, and simplicity is
               | something that's impossible to add once it's lost. The
               | message is not the existence of a rather general compile
               | time execution, but that with a particular design of such
               | a mechanism you can do away with other things.
        
               | pron wrote:
               | To say something truly original you don't have to invent
               | new words.
        
             | Sharlin wrote:
             | Yeah, I had written something like "Zig is pretty much
             | designed around the comptime system" at first, should've
             | probably kept it :D
        
             | jandrewrogers wrote:
             | The unapologetic embrace of compile-time metaprogramming in
             | a clean model drives most of my interest in Zig. I use
             | metaprogramming heavily in C++ to dramatically reduce code
             | size and increase type safety. Contrary to popular
             | assumption, in C++20 metaprogramming is relatively concise,
             | tidy, and maintainable, unlike the eldritch horror that was
             | metaprogramming in C++11. But I don't think anyone would
             | describe it as elegant or designed. I haven't had time yet
             | to try porting C++ metaprogramming to Zig in order to see
             | what is possible. Most other systems-y languages have a
             | much weaker metaprogramming story than C++.
             | 
             | Many developers underestimate the importance of
             | metaprogramming in systems languages for ensuring quality
             | and security. Defects scale proportional to the lines of
             | code. Strong metaprogramming facilities can reduce the
             | lines of code required several-fold.
        
               | gpderetta wrote:
               | C++11 gave us variadics which opened a new
               | metaprogramming world. For true horror try cons-based
               | typelists in C++98.
        
           | 1980phipsi wrote:
           | I agree with most of this, but I would say that D was
           | probably designed to be an alternative to C++ or Java. Zig
           | shares many of C's design philosophies, though I think D's
           | syntax is a little closer to C's than Zig's is to C's.
           | 
           | D also has a betterC mode that removes some of those "kitchen
           | sink" features, but I couldn't say how popular it's use is.
           | Many people who use D like some of those features. Zig's more
           | minimalistic approach to language features is closer to C's
           | approach.
           | 
           | Zig likely has been influenced by D (among other languages),
           | but I wouldn't be surprised if D's new importC functionality
           | has been influenced by Zig.
           | 
           | I don't use D (or C or C++) at work, but learning D and using
           | it at home has made me a much better programmer in the
           | languages I use for work. Maybe I would have learned those
           | lessons with C or C++, but I guess I took more easily to D
           | than them.
        
             | Sharlin wrote:
             | > I agree with most of this, but I would say that D was
             | probably designed to be an alternative to C++ or Java
             | 
             | You're agreeing with me then! D was pretty explicitly
             | created due to WalterBright's (incidentally a prolific
             | 60k+-karma HN user!), frustration with C++ and presumably
             | the problems that stem from its requirement to maintain
             | almost a 100% C compatibility. Bright is the founder of
             | Digital Mars, a small company developing C and C++ (and D)
             | compilers, so he has first-hand experience. Since 2007 D
             | has been co-developed by Andrei Alexandrescu, a well-known
             | C++ expert and one of the pioneers of template
             | metaprogramming.
        
         | [deleted]
        
         | optymizer wrote:
         | Having written a few toy compilers for C, to me D is the most
         | elegant of the C families of languages.
         | 
         | For example, consider the versatility of the '.' operator. It
         | does pointer dereference, static field access and
         | class/namespace member/method access. In all those cases I want
         | to access "a thing from this other thing". Having 3 operators
         | in a language (., ->, ::) is easier for a compiler writer but
         | it puts more cognitive load on the developer.
         | 
         | Now consider type extensions. As a language designer you could
         | think: "if a developer wishes to add methods to a type Vector
         | that exists in a third party library, then in their project,
         | the developer would use some special syntax" like maybe:
         | class Vector { void json() { ... } }
         | 
         | and maybe the developer also needs to have it even repeated
         | across a header file and an implementation, but... why? It's an
         | idea, but it's not a law of the Universe. There is a better
         | way.
         | 
         | Instead, the compiler can help you out. Just declare a regular
         | function anywhere like so:                   void json(Vector
         | v) { ... }
         | 
         | which is not special in anyway, and then be free to invoke it
         | as:                   json(v);
         | 
         | or                   v.json();
         | 
         | Why if it isn't our familiar dot operator! It also does type
         | extensions via some syntax sugar? What an elegant solution.
         | 
         | This is just one feature of D, and it's filled with features.
         | Though I don't really use D in my day to day work, it's
         | wonderful to see how the authors thought about hard problems in
         | language design and implemented elegant solutions. I think D
         | deserves more recognition for the things it absolutely hit out
         | of the park as a general purpose programming language.
        
       | Joker_vD wrote:
       | Wait, the thing the article calls "array of variant arrays (or
       | AoVA)" is _not_ the canonical way to decompose tagged unions?
       | Seriously? [AxBxC] [?] [A]x[B]x[C], that 's the most obvious
       | thing, isn't it?
       | 
       | Edit: Actually, never mind, as the comment points out, the
       | correct equation should be [A+B+C] [?] [A]x[B]x[C] and it's not
       | obvious at all whether it's actually correct.
        
         | pornel wrote:
         | Things like that can be done by specific collections (e.g. most
         | ECS implementations store components per "archetype").
         | 
         | But from language perspective the limiting factor is that
         | you're always allowed to take a reference to any instance of an
         | enum. This means the tag must be stored in every instance. If
         | you allow mutable references to enums, you must have enough
         | space for writing any variant there through the reference.
        
         | ratmice wrote:
         | Don't you mean something like [A+B+C] [?] [A]x[B]x[C]
        
         | kccqzy wrote:
         | Of course not: you are mistaking sum types and product types.
         | 
         | Even if we assume they are product types, [AxBxC] [?]
         | [A]x[B]x[C] is still not correct. The former doesn't allow the
         | number of As and Bs and Cs to differ: the latter does. So the
         | latter strictly speaking admits more values than the former.
         | 
         | Since both of them allow infinite number of values, to make a
         | size comparison we will use generating functions. A list of A
         | (or [A] in your notation) allows an empty list (one possible
         | value), a one-element list (as many values as A itself), a two-
         | element list... which becomes 1+A+A^2+...=1/(1-A). The
         | beautiful thing about calling them sum types or products types
         | is that you can manipulate it just by summing or multiplying
         | respectively.
         | 
         | So the number of values for [AxBxC] is identified by 1/(1-ABC).
         | For [A]*[B]*[C] it's 1/(1-A)/(1-B)/(1-B) which simplifies to
         | 1/(1-A-B-C+AB+AC+BC-ABC). Now it becomes obvious+ this form
         | admits more values and you can in a sense quantify how many
         | more!
         | 
         | +: Okay perhaps it's only obvious if I also include a Venn
         | diagram but diagramming is beyond what I can do on HN.
        
           | Joker_vD wrote:
           | Yeah, and if you carefully expand
           | 1+(A+B+C)+(A+B+C)^2+(A+B+C)^3+..., there'll be strictly more
           | summands than in
           | (1+A+A^2+A^3+...)x(1+B+B^2+B^3+...)x(1+C+C^2+C^3+...) due to
           | more flexible ordering of elements, so they're not
           | isomorphic.
           | 
           | I still think it's the most obvious thing that [A+B+C] at
           | least maps surjectively onto [A]x[B]x[C] :)
        
         | catgary wrote:
         | It's a section/retract, definitely not an isomorphism. You'll
         | need to carry around the index data, and any sort of array
         | update will incur a crazy upfront cost.
        
           | Joker_vD wrote:
           | Eh, I imagine it should be possible to maintain an out-of-
           | line skip-list or an array of tombstones or something to deal
           | with deletion/tag mutation (although in my experience with
           | ASTs that's not that common of a requirement); and allocation
           | can be done quite effectively with the help of the good old
           | .bss section: just grab couple of millions of every type of
           | node at the program's startup, virtual memory is almost free.
        
         | [deleted]
        
       | kazinator wrote:
       | You can solve this problem if a few clear lines of GNU C. We
       | declare two sister types, one of which is packed:
       | typedef struct taggedunion {         unsigned char tag;
       | union tu {           uint64_t   u64;           uint32_t   u32;
       | uint8_t    u8;         } u;       } taggedunion_t;
       | typedef struct __attribute__((packed)) packed_taggedunion {
       | unsigned char tag;         union tu u;       }
       | packed_taggedunion_t;
       | 
       | Then accesses to our dynamic array will convert between the
       | packed stored representation and the unpacked one in a
       | straightforward way:                 taggedunion_t
       | array_get(packed_taggedunion_t *a, size_t i)       {
       | return (taggedunion_t){ a[i].tag, a[i].u };       }
        
         | WhereIsTheTruth wrote:
         | For any language to be labeled as a "better c" language, it
         | must solve this problem, having better enums is a requirement
         | 
         | It's actually one of the reason I initially gave Zig a try, not
         | Comptime, it was Tagged Union
        
         | cryptonector wrote:
         | But it's C. The idea here is to have higher-level, type-safe
         | languages do this transformation automatically.
        
         | [deleted]
        
           | [deleted]
        
         | loeg wrote:
         | Your packed union eliminates pure padding, yes, but it still
         | costs 9 bytes to store a u8 value _and_ misaligns accesses for
         | wider elements. These are both significant misses on  "solving
         | the problem." You also need to copy each element on access,
         | which could be expensive for larger plain-old-data types, or
         | especially for non-POD types.
        
         | masfuerte wrote:
         | But this still takes 9 bytes (at least) to store a tagged
         | uint8_t.
        
           | kazinator wrote:
           | The article's approach requires 8 bytes, plus a separate
           | array of 1 byte tags.
           | 
           | The article reduces it to 4 bytes, plus an array of 1 byte
           | tags, by dropping the requirement for a 64 bit member.
           | 
           | If we have to grow the array, we have to do two
           | reallocations.
           | 
           | Caching is worsened: the tag and content are going to be in
           | separate cache lines.
        
             | masfuerte wrote:
             | The final approach in the article is an array for each
             | unique object size plus tagged indices/pointers. This would
             | take one byte per uint8_t and doesn't suffer from the
             | problems you mention, though it does have others. If memory
             | pressure is your main problem it's a big win.
        
       | BaculumMeumEst wrote:
       | I was checking out gameboy emulators recently written in C++,
       | Rust, and Zig that have seen recent development.
       | 
       | I couldn't get the Zig projects to build, which is understandable
       | since the language seems to have had a lot of breaking changes in
       | its build system and the language in general (which is why it
       | comes with a caveat not to use it in production yet).
       | 
       | But given the sheer number of comments I've read over the years
       | recommending Rust as a replacement for C++, I was more surprised
       | to see that I couldn't really find a single Rust project that had
       | correct timings (the frame rates were incorrect and audio
       | distorted) and worked with most ROMs I tried.
       | 
       | Meanwhile, the C++ projects I found built without issue or
       | hassle, had correct timings, ran all the games I've tried, had
       | lots of features, etc.
       | 
       | What gives? Is emulation just not a good use case for Rust?
        
         | 0xDEF wrote:
         | Building large-scale C++ projects has become less of an issue
         | because of CMake slowly becoming the default build tool.
        
           | anymouse123456 wrote:
           | An alternative opinion (that I happen to hold) is that the
           | unbelievable pain and frustration of dealing with CMake has
           | been at least one of the forces that have led to the Cambrian
           | explosion of new programming languages and build systems in
           | recent-ish years.
        
             | ghosty141 wrote:
             | While CMake is pretty annoying the problem from my
             | experience are people using it incorrectly and doing stupid
             | stuff that you can't change if you're using the project as
             | a library.
             | 
             | If every library would be properly "packaged" using CMake
             | would be far less annoying or frustrating. Just recently we
             | had the case where a 3rd party library author of a sizeable
             | c++ library set -Werror in his library which wreaked havoc.
        
         | [deleted]
        
         | coldtea wrote:
         | > _Meanwhile, the C++ projects I found built without issue or
         | hassle, had correct timings, ran all the games I 've tried, had
         | lots of features, etc. What gives?_
         | 
         | Aside the fact of the C++ projects being from seasoned C++
         | developers (which exist 3 decades now in huge quantities), not
         | people starting with Rust as their first "native" language and
         | using an emulators as a toy project? C++ exists for 3 decades
         | more on top of Rust's entire life of 8 years since v1.0. And
         | given the over time adoption distribution, most Rust devs are
         | like 1-4 years old Rust users max.
         | 
         | Or the fact that those C++ are probably much more mature, and
         | with more manyears in them compared to the Rust ones, some even
         | being major community projects?
        
         | raincole wrote:
         | > What gives?
         | 
         | There are still more good C++ programmers than good Rust
         | programmers.
        
           | tmccrary55 wrote:
           | This.
           | 
           | Many people are still learning Rust and an emulator is
           | probably a good way to get your hands dirty.
           | 
           | I'm doing something similar with a compiler project and my
           | code is gradually getting better and more idiomatic.
           | 
           | But I'm still doing a lot of stupid things and getting the
           | hang of it.
        
             | slikrick wrote:
             | it's not at all about "quality of programmer" and you
             | should never reduce extremely vague arguments to that. it's
             | literally about time for quality program to be made an
             | completed.
             | 
             | those C++ emulators have often been in development for 3-5x
             | as long as a similar Rust one.
        
         | rahkiin wrote:
         | I recently found https://github.com/Rodrigodd/gameroy to be a
         | very complete implementation
        
           | BaculumMeumEst wrote:
           | I did see that one, but it doesn't seem to build on Mac.
        
         | hu3 wrote:
         | Related: for all the flak GC gets, C# was used to produce two
         | amazing emulators:
         | 
         | 1) https://github.com/Ryujinx/Ryujinx
         | 
         | 2) https://github.com/TASEmulators/BizHawk
        
           | WhereIsTheTruth wrote:
           | ryujinx doesn't use the GC for their sub-systems, they use a
           | mix of manual memory allocations throught their allocators or
           | RC
           | 
           | For such project to succeed, you have to be able to optimize
           | your memory allocation strategy
           | 
           | If you just rely on the GC, you'll have bad surprises (why
           | does my game stutter during gameplay?!), they already have
           | multiple due to their double dip on the JIT, nothing prevents
           | the JIT compiler to compile during your gameplay, causing
           | massive stutters, just like how shaders compilation is an
           | issue at runtime
        
           | ghosty141 wrote:
           | Personally I think C# is a great language to get stuff done.
           | The big problem it has is the portability (especially when it
           | comes to using it with a GUI).
        
           | agentultra wrote:
           | GC often gets more flak than is deserved.
           | 
           | Often such flak ignores the differences between throughput
           | and latency.
           | 
           | For long, lived processes you'll end up writing some kind of
           | garbage collection system.
        
           | guipsp wrote:
           | Fwiw, ryujinx uses a jit on the hot path, and most (but not
           | all!) bizhawk cores are not actually c#.
        
             | hu3 wrote:
             | Ryujinx uses Microsoft's JIT .NET technology [1] which is
             | best available for C# codebases. Nothing new here since it
             | needs to translate Switch's ARM instructions to x86 machine
             | code somehow.
             | 
             | [1] https://devblogs.microsoft.com/dotnet/the-ryujit-
             | transition-...
        
         | jackmott42 wrote:
         | C++ has an ages old tried and tested ecosystem for game stuff
         | which probably makes a lot of rendering and audio tasks easier.
         | It could take Rust years to catch up even supposing the
         | language is inherently better.
        
         | Shish2k wrote:
         | As somebody who has written the same gameboy emulator in C++,
         | Rust, and Zig (as well as C, Go, Nim, PHP, and Python) - I have
         | yet to find a place where language affected emulation
         | correctness.
         | 
         | Gameboy audio is kind of a pain in the ass (at least compared
         | to CPU, which is fairly easy, and GPU, which is easy to get
         | "good enough" if you don't care about things like palette
         | colours being swapped mid-scanline) - and some languages take
         | more or less code to do the same thing (eg languages which
         | allow one block of memory to be interpreted in several
         | different ways concurrently will make the "interpret audio RAM
         | as a bunch of registers" code much shorter with less copying) -
         | but in my case at least, each one of my implementations
         | actually has the same audio distortions, presumably because I'm
         | misreading some part of the hardware spec :P
         | 
         | https://github.com/shish/rosettaboy/
         | 
         | (Also yes, the zig version is currently failing because every
         | time I look at it the build system has had breaking changes...)
        
           | BaculumMeumEst wrote:
           | Oh that's so cool! Thanks for sharing. I skimmed a few of
           | these and I'll definitely take a closer look later. I enjoyed
           | your observations on each language as well.
           | 
           | From a reader standpoint, I enjoy reading emulators written
           | in C the most. It sucks when C is missing features for
           | complex tasks, but emulation seems to fit nicely into what it
           | can do without black magic fuckery. Of course as someone not
           | super experienced with C, writing it feels like an endless
           | sea of "oh god why is this segfaulting".
           | 
           | Rust OTOH bothers me to look at because there are so many
           | sigils and language doodads for functionality that seems like
           | it should be really straightforward but whatever, I'm sure
           | that's just because I've barely used it and can't understand
           | it. I've been learning C++ recently because learning
           | materials are usually tailored to it for stuff I'm interested
           | in (vulkan currently), while ignoring a constant nagging
           | feeling that I should stpo being curmudgeon and take a closer
           | look at Rust.
           | 
           | I've tried Zig but am not looking too hard atm for many of
           | the reasons you mentioned in your repository, I'm hoping the
           | language will stabilize at some point and that LLMs will help
           | with some of the repetition/verbosity.
        
         | kristoff_it wrote:
         | wrt Zig, according to some it's a great fit for emulators, so
         | maybe that's why there are a few :^)
         | 
         | https://www.youtube.com/watch?v=jkkUS0nmdsg
        
         | ilyt wrote:
         | ...probably the fact most of the C/C++ project you'd find in
         | emulation exist longer than Rust itself ?
         | 
         | I'd imagine most of them in Rust were written for fun rather
         | than trying to best the established ones.
        
         | wiz21c wrote:
         | I'm currently writing an emulator in Rust (for Apple 2). For
         | some aspects I'm better than my C++ counterparts.... BUT those
         | counterparts have _years_ and _years_ of little refinements
         | that make a _huge_ difference on the fact that some games run
         | or don 't. There's no question about that.
         | 
         | Besides Rust is just fine. I use a bit of threads, a bit of
         | OpenGl and none of these is a problem. I'd venture to say that
         | the rust compiler is real good and that allows me to code
         | things without worrying about optimisation. Older emulator have
         | started years ago and they often had to optimize by hand which
         | leads to not so readable code; just your usual code legacy
         | story.
         | 
         | Finally, the cargo tool is a modern way to build the code and
         | I've been able to port my code to Linux, windows, MacOs and
         | raspberry pi without a single code modification (besides
         | expected stuff like audio frequency, file system, etc being
         | different).
         | 
         | The Rust crowds are just too new to have produced good
         | emulators. Give us time :-)
        
           | raphlinus wrote:
           | I'm also noodling on an emulator, shooting for cycle accuracy
           | (to the point where you'd be able to run vapor lock demos).
           | Mine passes the 6502 functional tests but not yet the Tom
           | Harte suite. To add to the difficulty, I want to run it on an
           | RP2040 and bit-bang the video. Is yours up in an open source
           | repo? Mine isn't (yet) but I'm open to joining forces.
           | 
           | I started mine about 6 years ago, put it on the shelf, and am
           | getting back to it. Overall I think Rust is working well for
           | it (and for pico DVI out), but I can also see the advantages
           | of Zig for something this low-level and specialized.
        
             | wiz21c wrote:
             | 133Mhz, that's 133 cycles per instruction for a 6502
             | (assuming you emulate only the 6502). Seems doable but then
             | there's the BCD stuff and that's rather tough emulate
             | (you'll definitely need a lot of cycles to reproduce
             | that)...
             | 
             | But that sounds real cool. Would you share you project ?
        
               | raphlinus wrote:
               | It's a bit tricky, but my decimal adc is 19 lines of
               | code, all of which is fairly straightforward, mostly bit
               | fiddling stuff that I expect will compile to a few
               | instructions each.
               | 
               | I need to go through an open source releasing process but
               | I'll get that going. Thanks for your interest!
        
         | [deleted]
        
         | panick21_ wrote:
         | Are you serious, a single specific software you like doesnt
         | exist in a specific language and you blame the language for
         | that? You see how that makes no sense right?
        
           | itishappy wrote:
           | That's not the situation. The software does exist and doesn't
           | work. "Is this a because of the language?" seems a reasonable
           | question.
        
             | pdimitar wrote:
             | It's not a reasonable question at all. My first reaction
             | was: "maybe the authors did it as a toy project". If nobody
             | is paying them to do a professional and bit-perfect GameBoy
             | emulator then there should be no expectations that the
             | software is going to be good.
        
               | itishappy wrote:
               | That's an insightful reply, and an important point to
               | consider. There's been a number of great answers in this
               | thread, and the general consensus seems to be "the
               | language is a great fit, but the projects haven't had
               | time to mature."
               | 
               | These seem (to me) like reasonable responses to a
               | reasonable question.
        
         | caskstrength wrote:
         | > What gives? Is emulation just not a good use case for Rust?
         | 
         | You fell for Rust PR (which is understandable, considering that
         | both this site and Reddit are full of loud Rust
         | fans/influencers). The language provides memory safety and some
         | fancy ML-like syntax and that is it. It won't implement good
         | emulator for you or make you a better SW developer.
        
         | dogben wrote:
         | The most accurate IBM PC emulator is in rust.
         | https://github.com/dbalsom/martypc
        
         | xmodem wrote:
         | I think its a perfectly fine use case but we're not yet at the
         | point where that community has reached a high enough level of
         | rust adoption that there are a significant number of mature
         | projects, even for something as relatively simple to emulate as
         | a gameboy.
        
         | insanitybit wrote:
         | Emulation seems fine for Rust. I don't think there's a good
         | answer to this anecdote. One option is that when I learned C++
         | I built an emulator to learn how to write an emulator, whereas
         | people writing emulators in Rust are perhaps often trying to
         | learn Rust. Maybe that leads to a difference.
         | 
         | It's basically impossible to say though, we'd need to do a full
         | survey of the C++ and Rust emulator ecosystem and look at which
         | bugs are most prevalent, control for authors experience and
         | skill, etc...
         | 
         | edit: I'll also note that when I built an emulator there were
         | guides for it in C++. I don't know what materials exist in Rust
         | but building an emulator is actually a project that I would
         | consider C++ for (if for learning purposes) just because it's
         | so well documented how to build one using the language.
        
         | sdfghswe wrote:
         | > What gives?
         | 
         | My guess based on no evidence whatsoever: The Rust community
         | seems to have lots of low effort enthusiasts - people whose
         | primary motivation is "use Rust" rather than "solve a problem".
         | People whose motivation is to solve a problem (in this case
         | build an emulator so they can play a game or whatever) tend to
         | be a lot more oriented towards the end-result, rather than what
         | tool they're using. I'm pretty confident this isn't an issue
         | with the language.
        
           | mkl wrote:
           | I think it's more "learn rust" than "use rust", and nowadays
           | a lot of partly written learning projects are published on
           | GitHub. A lot of the similar "learn C++" projects happened
           | earlier and either stayed private or are quite good by now.
        
           | jandrewrogers wrote:
           | Most of the developers I know that are learning Rust do not
           | come from a systems language background. They have to learn
           | the domain as well as the language, which is a high bar.
           | Learning to write systems-like code well takes a lot of time
           | and experience.
        
         | mathstuf wrote:
         | Timings and audio handling sound like logic things and not
         | something Rust's advantages over C++ would inherently support
         | better.
         | 
         | If I had to hazard a guess, it's that the C++ projects are just
         | more mature and have seen more "in-the-wild combat" to get to
         | the state they're in now. Other factors that may contribute (in
         | no particular order):                 - existing emulators are
         | Good Enough and polishing a new one is less "urgent" than other
         | problems in the current zeitgeist;       - hobby project vs.
         | no-longer-a-hobby project status; and       - emulator projects
         | in C++ are of similar quality are just as prevalent, but are
         | significantly older and instead of being discoverable on GitHub
         | are abandoned on Sourceforge or personal websites.
        
           | TonyTrapp wrote:
           | That would be my guess as well. Emulation, interpretation of
           | decades-old binary formats, and similar problems are hard to
           | get right on the first try, they often need years of testing
           | and breaking until they are right. A project that has been
           | around for 20 years will always have an advantage in this
           | space over a newcomer project.
        
           | loup-vaillant wrote:
           | Funny how being on GitHub makes you discoverable, while
           | maintaining your own domain name does not...
           | 
           | Actually no, that's not funny at all.
        
         | elcritch wrote:
         | You should add Nim to the list! There's a few implementations,
         | though I'm not sure how complete they are. As others point out
         | they're often used as ways to learn a language.
         | 
         | Fun writeup: https://itwasscience.com/blog/gameboy Improving
         | RosettaBoy performance: https://forum.nim-lang.org/t/9859
        
         | InfamousRece wrote:
         | I don't think Rust is inherently bad to write emulators. For
         | example one of the best cycle-accurate IBM XT emulators is
         | written in Rust: https://github.com/dbalsom/martypc
        
         | simias wrote:
         | Most people developing GameBoy emulators these days do it as a
         | toy project, not a serious effort to create an ultra-accurate
         | general purpose emulator. It's like writing a raytracer or a
         | Pong clone.
         | 
         | The best GameBoy emulators like Gambatte predate Rust by almost
         | a decade and are often written in C++. Since GameBoy emulation
         | is pretty much a solved problem there's no strong motivation to
         | innovate in the scene.
         | 
         | I've written several emulators in Rust, I'd argue that it's
         | very well suited to this exercise, but it's also not an area
         | where Rust shines particularly. There's no complex memory
         | management in emulators usually since most memory buffers are
         | just statically allocated constant-size arrays that mimic the
         | physical RAM on the console.
        
         | olafura wrote:
         | So the question is why a project created in a language that has
         | been around since before the devices being emulated are more
         | mature than ones in a language that is 8 years old?
         | 
         | Sorry to be so snarky. It probably depends on people's time
         | working on it and their familiarity with async processes. This
         | is an inherent problem with any media like that. With sound,
         | it's primarily that it can't come before the image but can lag
         | behind the image. Stuff like this.
         | 
         | I don't think any programming language is going help with that
         | because it's how we perceive things, not just how correct your
         | code is.
        
         | alex_lav wrote:
         | > But given the sheer number of comments I've read over the
         | years recommending Rust as a replacement for C++,
         | 
         | > What gives?
         | 
         | This is The Hype Wave, where something is new and exciting so
         | it's recommended over the things of the past without regard to
         | whether it's as stable/full featured/correct. People are
         | excited about Rust so Rust has been jammed into just about
         | everywhere to varying degrees of success/correctness. This is
         | the same as Javascript/Node and Go over the last decade-ish.
         | Zig and Mojo are probably next.
        
         | bowsamic wrote:
         | Which Rust emulators? Which C++ emulators?
         | 
         | Perhaps the C++ emulators were serious, long-standing projects,
         | and the Rust emulators were just hobby projects.
         | 
         | Most hobbyists only care about getting a game on the screen.
        
       | ForkMeOnTinder wrote:
       | There's another strategy that has good storage efficiency and
       | keeps the ability to iterate over elements. vec #1 is a list of
       | tags, vec #2 keeps the byte offset of each element, and vec #3
       | (not really a vec) is the packed variant data pointed to by vec
       | #2. This:
       | 
       | - is half the number of vecs of the author's final solution (6 vs
       | 3)
       | 
       | - wastes no bytes for padding (unless needed for alignment)
       | 
       | - lets you iterate in a cache-friendly way since the data is laid
       | out in order in memory regardless of type
       | 
       | - even lets you access items by index in O(1)
       | 
       | Generally it has the same performance characteristics as a
       | Vec<T>, but for heterogeneous data.
        
         | pornel wrote:
         | But if you need to mutate such collection, you'll end up
         | writing a memory allocator (to handle removals, changes to
         | larger variant, and to deal with fragmentation).
        
         | [deleted]
        
         | dist1ll wrote:
         | Storing byte offset inline, great idea. Thanks for mentioning
         | that.
         | 
         | Just note one disadvantage: if the byte offset is stored in
         | memory, you have a data dependence on the traversal. So even
         | though it's cache-friendly, it may cause serious memory stalls
         | in the processor's pipeline.
        
           | dan-robertson wrote:
           | One trick for this is to leverage the branch predictor to
           | guess where the next item will be (I first saw this trick for
           | linked list traversal as in practice linked lists are often
           | laid out linearly). Something like                 // or
           | guess based on the size of the current item       let
           | guessed_next_offset = current_offset + (current_offset -
           | previous_offset);       let actual_next_offset =
           | byte_offsets[++i];       previous_offset = current_offset;
           | current_offset = guessed_next_offset;       if(current_offset
           | != actual_next_offset)         // need to make sure the
           | compiler doesn't 'optimise out' this if and always run the
           | below line         current_offset = actual_next_offset;
           | // ...
           | 
           | In the ordinary case, where the guessed offset is correct,
           | the cpu predicts the branch is not taken, and no data
           | dependency of ... on actual_next_offset is introduced. If the
           | branch is mispredicted then that speculatively executed work
           | with the wrong current_offset is dropped. This is a bit slow
           | but in the case where this happens all the time, the branch
           | predictor just gives you the slightly bad perf of the
           | traversal with the data dependency (computing the guess will
           | be negligible) except you pay if the guess was actually
           | right.
        
           | jadbox wrote:
           | I'd love for someone to benchmark these variants to tease out
           | their actual characteristics (particularly for x86).
        
         | loeg wrote:
         | Offsets can add a significant amount of space relative to small
         | T, if their size is not optimized (e.g., 64-bit size_t and
         | uint8_t T). So as long as we are careful about offset size this
         | seems like a reasonable approach.
        
           | dan-robertson wrote:
           | Indeed part of this is pointed out in the video about Carbon
           | linked in the OP: juxtaposition gives you a 'free' pointer
           | going from each node of your data structure and for some
           | tasks you don't really need more than that.
        
       | bfrog wrote:
       | I still have yet to understand if Zig actually provides memory
       | safety in the same manner Rust does or not, it seems like its
       | another C like language without memory safety? Or did I happen to
       | miss that bit?
        
         | dgb23 wrote:
         | Zig provides tools to avoid memory related bugs, but it doesn't
         | enforce it via compile time checked RAII/burrow checking etc.
         | 
         | Nullability (and error handling) is compile time checked
         | though.
        
           | bfrog wrote:
           | To me the biggest win in Rust is just that, the memory
           | related bugs are nearly non-existent. Throw in a group of
           | people working on something and this makes the project move
           | just so much faster in both review time and time hunting
           | bugs.
        
             | cmrdporcupine wrote:
             | IMHO the real big win w/ Rust is not so just memory safety
             | from (some) leaks or segfaults but the fact that the borrow
             | checker rules apply also to concurrency related scenarios
             | and so race conditions and deadlocks become much more
             | difficult to accidentally produce.
             | 
             | Developers can generally be taught to handle memory
             | ownership issues fairly well, esp when helped out with RAII
             | smart pointers, etc.
             | 
             | But in my experience when you throw concurrency in, even
             | very high quality engineers tend to stumble and
             | accidentally introduce race conditions.
             | 
             | In this respect Rust has something over the garbage
             | collected languages, too. Because Java, Go, JS, C#, etc
             | _will_ effectively shield you from  "use after free" but
             | they will absolutely _not_ stop you from passing an
             | unlocked, etc. reference to another thread and screwing
             | yourself up royally; and Rust will.
        
         | [deleted]
        
         | throwawaymaths wrote:
         | Zig gives buffer overflow safety and null pointer safety but
         | not use after free, double free, or stack pointer escape
         | safety.
         | 
         | Zig also gives you memory exhaustion safety which rust does
         | not.
        
           | pitaj wrote:
           | What do you mean by memory exhaustion safety?
        
           | oconnor663 wrote:
           | Is there a memory exhaustion scenario where Rust does
           | something other than panic?
        
             | pitaj wrote:
             | You can use the fallible allocation APIs such as
             | `Vec::try_reserve`
        
             | kibwen wrote:
             | Rust-the-language doesn't do any dynamic allocation, so it
             | doesn't exhaust memory at all. The Rust stdlib currently
             | guarantees an abort on OOM, but in the future this will be
             | changed to a panic (which you can always configure to be an
             | abort if you want). Both of these behaviors are memory-safe
             | (it's unclear what "memory exhaustion safety" is referring
             | to).
        
               | throwawaymaths wrote:
               | Also to clarify: there are platforms (like linux) which
               | de facto have nonfailable allocation, if you run out of
               | memory the system will oom kill your process, and neither
               | zig nor rust will save you from that.
               | 
               | I believe In practice the most common place where
               | failable OOM is a big deal is in embedded systems
               | programming
        
               | 10000truths wrote:
               | No language can save you from OOMs, but because Zig
               | pervasively uses explicit memory allocation, it makes it
               | easy to greatly mitigate OOM risks by front-loading all
               | the fallibility:
               | 
               | 1. Calculate and allocate all your required memory
               | immediately upon process startup (including memory from
               | OS resources like sockets)
               | 
               | 2. Call mlockall to prevent the pages from being swapped
               | out
               | 
               | 3. Write "-1000" to /proc/self/oom_score_adj to avoid the
               | OOM killer
               | 
               | 4. Use your preallocated memory as a pool for all further
               | allocations
               | 
               | With the above approach, the application has full control
               | over how to handle application-level OOMs (e.g. applying
               | backpressure to connecting clients, or shrinking non-
               | critical caches) once it is past the start-up stage.
        
               | throwawaymaths wrote:
               | > memory exhaustion safety
               | 
               | Sure this more like DDOS mitigation, so it memory related
               | safety, not memory safety.
        
           | gpanders wrote:
           | >Zig gives buffer overflow safety and null pointer safety but
           | not use after free, double free
           | 
           | It _can_ provide the latter two through the use of the
           | `GeneralPurposeAllocator`, which tracks UAF and double free.
           | 
           | Stack pointer escape safety is being actively researched
           | (there are a few tracking issues, see [1]). I'm personally
           | interested in this, I've written a decent-ish amount of Zig
           | for toy/personal projects, and anecdotally stack pointer
           | escape is the only type of memory error that has bitten me
           | more than once (though to be fair, one of these cases was
           | calling into a C API, so even Rust wouldn't have helped).
           | 
           | More broadly, the ability to catch all forms of UB in
           | Debug/safety builds is an accepted proposal [2], though
           | whether or not it will be possible in practice is a different
           | (and interesting!) question.
           | 
           | [1]: https://github.com/ziglang/zig/issues/2646
           | 
           | [2]: https://github.com/ziglang/zig/issues/2301
        
             | SkiFire13 wrote:
             | The way `GeneralPurposeAllocator` works is kinda scary
             | though, since it may result in whole memory pages used only
             | for very small allocations, effectively multiplying the
             | memory usage of the program. Also kinda goes against the
             | memory exhaustion safety, since now you're more likely to
             | exhaust memory.
        
             | saagarjha wrote:
             | An allocator that does heap quarantining at the page level
             | is not a "general purpose allocator". It is a debug tool.
        
             | jmull wrote:
             | Yeah, I don't think it's right to say Zig doesn't have use-
             | after-free and double-free safety. If you want that, you
             | can just use a general purpose allocator. Note that you
             | can't forget to choose an allocator since they are
             | explicit.
             | 
             | Is this somehow harder than, say, choosing not to use
             | "unsafe" in Rust?
             | 
             | Maybe all that is missing is a linter to help enforce
             | whatever memory-management policy you've decided on. That's
             | not really needed for small, coherent teams, but would be
             | important for using Zig in larger projects with multiple
             | teams and/or disparate individual contributors. (Perhaps
             | such a thing exists and I just don't know about it.)
             | 
             | You might also be able to use an arena allocator where free
             | is a no-op. That has different tradeoffs, but is also safe
             | for use-after-free and double-free.
             | 
             | As you say, stack escape is the main thing where Zig
             | doesn't have a good memory-safety story yet (at least not
             | that I've heard). I guess there are a few others that
             | concern me when I see them on a list, though I haven't hit
             | them in real life.
        
               | saagarjha wrote:
               | C has use-after-free and double-free safety if you patch
               | out free. Not really a solution, is it?
        
               | jmull wrote:
               | What do you think the difference is between "patching out
               | free" and allocators as a first-class feature? I'll bet
               | you can think of a few rather significant ones if you
               | try.
        
               | throwawaymaths wrote:
               | Static analysis at the IR level would be awesome. It
               | could catch use-undefined, stack escape, and probably
               | uaf/df as well so you don't have to lean on gpa's
               | (potentially expensive) tricks. Plus there are times when
               | you maybe don't want to use page allocator.
               | 
               | As an aside. I'm not certain I understand how double free
               | is memory unsafe (in the sense of "causing
               | vulnerabilities")
        
               | saagarjha wrote:
               | A double free breaks invariants for the memory allocator.
               | For example, the freed memory may have been handed out to
               | someone else and if you free it again, it will be marked
               | as unused even though that code relies on their stuff
               | being available. One very classic way of exploiting a
               | double-free is a sequence like this happens:
               | 
               | 1. Some code allocates memory.
               | 
               | 2. The code frees the memory, but keeps a stale reference
               | to it around. It is marked as unused by the allocator.
               | 
               | 3. Some other code allocates memory. Maybe it's reading
               | the password file off of disk. The allocator has some
               | unused memory lying around so it hands it out-but it
               | turns out that this is actually just a reuse of the
               | buffer from earlier. It is now marked as "in use" again
               | by the allocator.
               | 
               | 4. The code from earlier has a bug and frees the
               | allocation again. This means that the allocation is now
               | marked as "unused".
               | 
               | 5. Another allocation request hands out this memory
               | again. Maybe it's a buffer for user input? Well, it's
               | been scribbled all over with other data now.
               | 
               | 6. Someone asks to authenticate and the password checking
               | code gets called. It has the password right here to check
               | against...oh, wait, that memory got freed out from under
               | it and overwritten with some attacker-controlled content!
        
               | jmull wrote:
               | > I'm not certain I understand how double free is memory
               | unsafe (in the sense of "causing vulnerabilities")
               | 
               | Perhaps there are some allocators where doing that hits
               | UB. UB in memory allocation is probably always a memory
               | safety issue. I would say if your code accepts any
               | allocators where double-free could be UB then you've got
               | a safety issue.
        
         | xigoi wrote:
         | > I still have yet to understand if Zig actually provides
         | memory safety in the same manner Rust does or not
         | 
         | It doesn't.
        
           | bfrog wrote:
           | So its C with nicer macros and maybe some more sensible ways
           | of dealing with arrays and such?
           | 
           | For me that's not enough to move the needle.
        
             | cmrdporcupine wrote:
             | It's more than that in that it has a sane syntax and module
             | and build system.
             | 
             | I'm a Rust person by both day job and hobby, but I can see
             | the niche Zig is in as being quite useful in particular for
             | embedded work.
             | 
             | One thing it definitely has over Rust is way better support
             | for explicit per-structure allocator/allocation management.
             | The allocator_api stuff in Rust has just been sitting
             | unstable for years at this point, while Zig shipped with
             | support for this immediately. Which is key for work in all
             | sorts of systems level contexts.
             | 
             | Probably the language I _want_ is somewhere between the
             | two.
        
               | [deleted]
        
               | bfrog wrote:
               | I mean I really have a hard time going backwards on the
               | whole idea of lifetimes and send/sync type traits which
               | force escape hatches if you want to do something idiotic.
               | The key piece here is, any project of significant size
               | and usage, someone will try to do something stupid. Rust
               | makes the stupid so obvious with unsafe blocks and
               | annotations, that reviewing code from some random person
               | on the internet is made easier. The tooling made the task
               | of reviewing code mostly limited to the code being
               | changed rather than the whole project.
               | 
               | In C/C++ land you need to consider not only the code
               | being changed but also the entire code base and how that
               | code is used. It's not enough to look at the diff. You
               | need to understand the context.
               | 
               | Not to say that isn't also true with Rust to some degree,
               | but the degree is usually in terms of logic and less "is
               | some pointer/reference going to cause a
               | race/segfault/buffer xflow?"
               | 
               | The langauge itself doesn't define allocation, Box is in
               | the stdlib and this allows for nostd libraries to deal
               | with things as I guess most of us might expect I'd think.
               | It would be cool to allow for a global allocator though
               | to enable std on more platforms with better semantics, no
               | disagreement there.
        
           | emmanueloga_ wrote:
           | It doesn't, but it puts a RED BANNER in every place where
           | memory is allocated, and you need to explicitly pass an
           | allocator of your choosing whenever memory is allocated. It
           | also contains no hidden allocations or function calls. So it
           | forces you to think about your memory allocation patterns,
           | and also includes some tools to catch memory leaks [1].
           | 
           | 1: https://ziglang.org/learn/samples/#memory-leak-detection
        
             | saagarjha wrote:
             | That doesn't really help.
        
         | [deleted]
        
         | maxmcd wrote:
         | I found this article to be clarifying: https://www.scattered-
         | thoughts.net/writing/how-safe-is-zig/
        
           | bfrog wrote:
           | In effect this answers my question pretty well, the answer
           | seems to be a resounding "No there's no memory safety" at
           | least in the practical or realistic sense that results in few
           | to no memory lifetime errors.
        
       | Andrew018 wrote:
       | [dead]
        
       | pornel wrote:
       | Rust's type system ended up being Turing-complete anyway. It
       | seems like it fell into the same trap as C++ templates: it's an
       | accidental programming language in a programming language, but
       | with way worse syntax.
       | 
       | The lesson for language designers may be that every useful type
       | system is doomed be Turing complete, so you may embrace it from
       | the start, instead of trying to make it declarative.
        
         | itishappy wrote:
         | Fascinating! Now I'm curious, what causes this? Where does
         | Turing completeness come from?
         | 
         | Polymorphism? Inference? Higher-kinded types? (Probably not
         | that last one, I don't think Rust has them.)
         | 
         | Put another way, what would it take a for a type system to NOT
         | become Turing complete?
        
           | AprilArcus wrote:
           | All you need is a way to branch and a way to loop, so
           | anything with conditional types and recursive types will be
           | Turing Complete
        
           | ynik wrote:
           | Turing completeness doesn't have to come from a single
           | feature, it often comes from a combination of features.
           | 
           | e.g. the interaction between subtyping (e.g. inheritance) and
           | generics (with variance) is tricky:
           | https://www.cis.upenn.edu/~bcpierce/papers/variance.pdf
           | 
           | It can be highly nontrivial to tell if a language actually
           | has a Turing complete type system: the 2007 Kennedy&Pierce
           | paper made it likely that java was turing complete; but it
           | took until 2016 until it was finally proven that to be turing
           | complete (https://arxiv.org/abs/1605.05274).
           | 
           | > What would it take a for a type system to NOT become Turing
           | complete?
           | 
           | An analysis of all possible interactions between all features
           | in the type system, building a formal proof that the type
           | system is not turing complete. This is not really realistic
           | for the style of complex generic type systems that
           | programmers are now used to, it would need to be a vastly
           | simpler language.
        
         | avgcorrection wrote:
         | Surely they knew that it would become a programming language in
         | its own right. That's the fate of all such statically typed
         | languages which give you type-level power but that aren't
         | dependently typed. (Of course dependently typed languages might
         | end up with other languages like its own _tactics_ language.)
        
           | insanitybit wrote:
           | I don't think anyone has ever been surprised to find that a
           | type system (with inference) is turing complete.
        
         | baq wrote:
         | it's easier to make something Turing-complete than not... most
         | interesting example is probably rule 110.
        
           | wredue wrote:
           | The point is that it'll end up that way, so just embrace it
           | from the start.
        
             | baq wrote:
             | the lisp way. the problem is it results in software which
             | is difficult to reason about due to all the compile-time
             | execution, even if you have state-of-the-art compile-time
             | debuggers (if any language has them, it is lisp).
        
               | wredue wrote:
               | I haven't so far found any zig code terribly hard to
               | reason about, and it's important to know that compile
               | time type crafting (currently) doesn't support decls.
               | Although you can madness your way around that, theres
               | idiomatic ways to manage this.
               | 
               | Comptime is definitely very powerful, and even the top
               | people using zig frown upon "magic".
               | 
               | But to be completely honest, "magic" shit just happens
               | all the time if you enable it. Rust macro abuse to get
               | "magic" comes to mind. If it happens in rust, it'll
               | certainly happen in zig.
        
         | Ar-Curunir wrote:
         | That something can be Turing-complete in the worst case doesn't
         | mean that you hit those worst-cases on a frequent, or even
         | occasional basis.
        
         | marcosdumay wrote:
         | > instead of trying to make it declarative
         | 
         | Wait, no. A language being Turing complete and declarative are
         | completely independent things.
        
           | dgb23 wrote:
           | That's the point!
        
         | hnfong wrote:
         | > Rust's type system ended up being Turing-complete anyway. It
         | seems like it fell into the same trap as C++ templates
         | 
         | I hear it is a common trap.
         | 
         | https://aphyr.com/posts/342-typing-the-technical-interview
        
         | dist1ll wrote:
         | author here. That's a really good observation kornel. Rust's
         | philosophy of avoiding post-monomorphization errors at all
         | costs is usually sung in high praises (followed by a ridicule
         | of C++ template errors) - but it comes at a cost. And when you
         | stray away from the declarative way and start plastering const
         | generic bounds everywhere you really feel the pain.
         | 
         | Anyways, I think engaging in this topic as a community is super
         | important. Cause that's the only way to push PLs forward and
         | explore this massive space.
        
         | packetlost wrote:
         | This is sort of the approach that I've considered. Type systems
         | are just predefined code/behavior baked into a language, but
         | there's _technically_ nothing stopping you from arbitrarily
         | extending the compiler like Lisps do with macros to support a
         | type system that 's written in the language itself.
        
           | naasking wrote:
           | Type Systems as Macros:
           | 
           | https://www.khoury.northeastern.edu/home/stchang/pubs/ckg-
           | po...
        
         | mlochbaum wrote:
         | I did this! Singeli is an Elixir-like compile-time language
         | where types and variables are first-class values, on top of
         | C-ish semantics that are meant to be more like "portable
         | assembly". It's designed for high-performance code where you
         | have to be able to control the specific instructions, but the
         | overall strategy could be useful in other domains too.
         | 
         | https://github.com/mlochbaum/Singeli
         | 
         | And a podcast on it came out Friday:
         | 
         | https://www.arraycast.com/episodes/episode62-what-is-singeli
        
           | mk12 wrote:
           | I listened to that, it was good! I was surprised there is an
           | entire podcast about array programming. Looking forward to
           | future episodes, and also to trying AoC with BQN again in a
           | few months.
        
         | [deleted]
        
         | Aardwolf wrote:
         | Genuine question:
         | 
         | How do type system that are so abstract and complex to become
         | turing complete help?
         | 
         | I definitely find it extremely helpful to be able to write
         | containers of any type (like std::vector<T>), that saves a ton
         | of code duplication, but beyond that what more is needed and
         | why? What we're competing with here is: just write a function
         | that operates on the data types you want and gets the job done.
         | 
         | You're writing code for the CPU that has to actually do
         | something, on actual known types.
         | 
         | What programming task is simplified by having an "any" type or
         | a type system that allows you to write pong-played-turn-by-
         | turn-using-compiler-error-messages? It's cool that you can have
         | an "any" type just like universal sets in set theory, but what
         | real-life programming scenario does this simplify (you can
         | already write containers that can contain anything you want
         | without using such as thing as an "any" type)?
         | 
         | At least not the kind of programming tasks I do, but admittely
         | I think fairly low level and prefer my types to have exact
         | known amounts of bits, known signed integer convention and
         | endianness so I can efficiently use shifts and get the bits I
         | need, preferably with as little undefined behavior as possible.
         | 
         | Asked differently: If one were to design a programming language
         | that only has basic types (primitives, structs/classes, ...)
         | and templates to allow functions/classes to operate on any type
         | (but not more than that; substitute template type with the
         | actual type, compile this, nothing more), what feature will
         | users of the language be missing and complain about?
        
           | munificent wrote:
           | _> Asked differently: If one were to design a programming
           | language that only has basic types (primitives, structs
           | /classes, ...) and templates to allow functions/classes to
           | operate on any type (but not more than that; substitute
           | template type with the actual type, compile this, nothing
           | more), what feature will users of the language be missing and
           | complain about?_
           | 
           | This is a really good question. There are some languages that
           | work as you describe: SML and some others in that family.
           | There are generic functions and types, but the type
           | parameters are basically just placeholders. You can't _do_
           | anything with a value whose type is a type parameter, except
           | store it and pass it to things that also take type
           | parameters.
           | 
           | That gives you enough to write a nice reusable vector type.
           | But it doesn't let you easily write a nice reusable hash
           | table. You can, but users have to pass in an explicit hash
           | function that accepts the key type every time they create a
           | hash table.
           | 
           | It might be nice if a type itself could indicate whether it's
           | hashable and, if so, what it's hash function is. Then, if you
           | create a hash table with that key type, it automatically uses
           | the hash function defined by that type.
           | 
           | Now you need some sort of constraints or type bounds in your
           | generics. That's what traits in Rust and bounds in Java and
           | C# give you. (The literature calls it "bounded
           | quantification".) It's a _big_ jump in complexity. But it
           | does mean that now you can call functions /methods on
           | arguments/receivers whose type is a type parameter, and those
           | calls can be type checked.
           | 
           | Bounds are themselves types, so what kinds of types can you
           | use in bounds? Can they be generic? If so, what kinds of type
           | arguments are allowed? Can you use type parameters from the
           | surrounding type?
           | 
           | For example, which of these are OK and which aren't (using
           | Java-ish syntax):                   class A<T extends Foo> {}
           | // 1.         class A<T extends Bar<Foo>> {}  // 2.
           | class B<T extends B<Foo>> {}    // 3.         class B<T
           | extends Bar<T>> {}    // 4.         class B<T extends B<T>>
           | {}      // 5.
           | 
           | Any kind of bounded quantification will give you 1-3. What
           | about 4 and 5? This is called "F-bounded quantification". Why
           | would you want such a thing?
           | 
           | Collections with fast look-up are important, which is why we
           | extended our generics to enable us to write nice reusable
           | hash tables. But some data types aren't easily hashed but can
           | be easily ordered. A sorted collection is faster than an
           | unsorted one.
           | 
           | How would we write a generic sorted collection? We could
           | require you to always explicitly pass in an ordering function
           | for any given element type, but it would be nice if the
           | element type itself could supply is order function.
           | 
           | You could define a "Comparable" interface that a type can
           | implement to support comparing an instance against another
           | object of some type, like:                   interface
           | Comparable<T> {           int compareTo(T other);         }
           | 
           | And then implement it on your type, like:
           | class Color implements Comparable<Color> {           int r,
           | g, b;                int compareTo(Color other) => ...
           | }
           | 
           | In our sorted collection, elements all have the same type, so
           | the bound that we need looks like:                   class
           | SortedCollection<T extends Comparable<T>> { ... }
           | 
           | Notice that we have "T" inside the bound. That's F-bounded
           | quantification.
           | 
           | Using type parameters inside a bound isn't the only place
           | recursive types like this show up. Let's say you wanted to
           | make a _generic_ type comparable. You 'd do something like:
           | class Pair<T> implements Comparable<Pair<T>> {           T a,
           | b;                int compareTo(Pair<T> other) => ...
           | }
           | 
           | Now here, the implements clause is using not just the type
           | parameter of the enclosing type, but the entire type.
           | 
           | We had a couple of fairly modest goals:
           | 
           | * Be able to create reusable hash tables where the hash
           | function is inferred from the key type.
           | 
           | * Be able to create reusable sorted collections where the
           | comparison function is inferred from the element type.
           | 
           | And in order to get there, we needed generics, bounds, and
           | even F-bounded quantification.
           | 
           | Adding even a little more usefulness to our collection types
           | will quickly have us reaching for variance annotations,
           | associated types, and even more exotic stuff.
        
           | widdershins wrote:
           | It's about when you want to enable optimisations or
           | conveniences depending on the parameterized type.
           | 
           | For example, in the std::vector<T> type, if T supports being
           | moved, you want to use that when growing your vector for
           | performance. If T doesn't support being moved, you will have
           | to copy it instead.
           | 
           | Boom: you've ended up with template metaprogramming.
        
           | gpderetta wrote:
           | > You're writing code for the CPU that has to actually do
           | something, on actual known types.
           | 
           | (Partial) specialization, which is a feature used to get
           | templates to do actual something on actual types is what
           | principally allows templates to be turing complete.
        
         | lenkite wrote:
         | Basically design in dependent types into the language right
         | from the beginning (like Iris) - such that types are first
         | class and can be computed and manipulated like any other
         | language construct.
        
         | bhouston wrote:
         | > Rust's type system ended up being Turing-complete anyway. It
         | seems like it fell into the same trap as C++ templates: it's an
         | accidental programming language in a programming language, but
         | with way worse syntax.
         | 
         | Most anything in software can easily end up Turing-complete
         | though. It isn't that high of a bar unfortunately. I think most
         | mature type and template systems end up as Turing-complete. To
         | single out Rust for this is to ignore that this is just
         | standard fare for mature typing systems.
         | 
         | TypeScript's types are Turing complete:
         | https://github.com/microsoft/TypeScript/issues/14833
         | 
         | Python's type hints are Turing complete:
         | https://arxiv.org/abs/2208.14755
         | 
         | Java generics are Turing complete:
         | https://arxiv.org/abs/1605.05274
        
           | egl2021 wrote:
           | "It isn't that high of a bar unfortunately." --> "It isn't
           | that high of a bar."
        
           | 38 wrote:
           | > Most anything in software can easily end up Turing-complete
           | though.
           | 
           | not really, if you actually pay attention when youre
           | designing the type system:
           | 
           | https://arxiv.org/pdf/2005.11710.pdf
        
             | bhouston wrote:
             | My experience with Turing-completeness in any custom DSL or
             | similar is that you will generally get Turing-complete
             | systems unless you work really really hard to avoid them.
             | And it only takes one wrong move by anything anywhere in
             | the system, and you'll end up with Turing-completeness.
             | 
             | Instead of trying to avoid Turing-completeness, just expect
             | it, embrace it and instead deal with its consequences to
             | contain the fallout.
        
               | 38 wrote:
               | > Instead of trying to avoid Turing-completeness, just
               | expect it, embrace it and instead deal with its
               | consequences to contain the fallout.
               | 
               | this seems like such a cop out. its like saying "oh
               | failure is a given, so don't even try to succeed". at
               | least currently, Go generics are NOT Turing complete. the
               | generics were designed in part specifically to avoid
               | that. so just because Rust (and others) failed, doesn't
               | mean its impossible.
        
               | insanitybit wrote:
               | I don't think most people care at all that their type
               | system is turing complete. It basically never comes up.
        
               | taeric wrote:
               | Agreed. Adding to this point, why should I care? Am I
               | missing anything obvious on why you don't want it?
        
               | insanitybit wrote:
               | Not really. In theory it means your compiler might never
               | terminate... that does not usually happen.
        
               | bhouston wrote:
               | > this seems like such a cop out. its like saying "oh
               | failure is a given, so don't even try to succeed".
               | 
               | It is a pragmatic cop-out yes. I found it is easier to
               | make progress by assuming that you'll get to Turing
               | Complete rather than investing in the time to avoid it. I
               | found that the downsides of Turing Completeness are
               | usually overhyped or primarily theoretical.
               | 
               | > so just because Rust (and others) failed, doesn't mean
               | its impossible.
               | 
               | It definitely is not impossible. But I don't know if it
               | is worth it.
               | 
               | In the end I want fast compile times, type safety, easy
               | to maintain code and good error messages. I do not care
               | if the type system is Turing Complete.
        
               | 38 wrote:
               | > In the end I want fast compile times, type safety, easy
               | to maintain code and good error messages. I do not care
               | if the type system is Turing Complete.
               | 
               | I think the problem is some languages with Rust go too
               | far with generics, which probably triggers the turing
               | complete. for example, this is valid Rust code:
               | let mut handles = Vec::new();         for i in 0..10 {
               | let handle = do_work(i);
               | handles.push(handle);         }
               | 
               | but you have to follow the code all the way to "do_work"
               | before you ever find the type of anything. Go does not
               | allow this. you need to either declare a concrete type:
               | var handles []int
               | 
               | or a explicit generic type:                   type
               | slice[T any] []T         var handles slice[int]
               | 
               | I think Rust is lose lose, because the underlying type
               | implementation is Turing complete, and I would argue the
               | code is actually less readable because of the overuse of
               | type inference.
        
               | bhouston wrote:
               | > I think Rust is lose lose, because the underlying type
               | implementation is Turing complete
               | 
               | I am not a Rust coder so I can't really comment.
               | 
               | Currently, I do TypeScript, which also has a Turing
               | complete type system, and I love it. Of course all things
               | in moderation. Even though I could make a completely
               | obtuse type design for projects, I try to not write code
               | that others can not understand.
        
             | insanitybit wrote:
             | "if you actually pay attention" or, in other words, if you
             | produce novel research into the area that's worthy of
             | publishing, and noting that this is only just the start and
             | has its own limitations:
             | 
             | > The cost is that of requiring a whole program analysis
             | and disallowing programs that would result in infinite
             | instantiations (Section 5.3). Clearly, this is the
             | beginning of the story, not the end.
        
               | [deleted]
        
           | dgb23 wrote:
           | The point is that Zig basically circumvents issues
           | surrounding that, by introducing "comptime", where you can
           | just write regular Zig to achieve the same things.
           | 
           | The article showcases a nice example of having this very
           | direct power.
           | 
           | But it really comes up more often than you think as soon as
           | you actually have it.
           | 
           | It's easy in Zig to allocate precisely based on computed
           | values and then have the sizes as part of your types etc. It
           | all falls out of some simple ideas and it's all just regular
           | Zig code.
           | 
           | "Types in Zig are values of the type type" from:
           | https://ziglearn.org/chapter-1/
           | 
           | So instead of making it hard to write incorrect programs, Zig
           | makes it easy to write correct programs.
        
             | littlestymaar wrote:
             | The problem is that it ends up being dynamically-typed,
             | like C++ templates. See:
             | https://nitter.net/pcwalton/status/1369114008045772804
             | 
             | > So instead of making it hard to write incorrect programs,
             | Zig makes it easy to write correct programs.
             | 
             | Well maybe in theory, but the current state of Zig is that
             | it makes it hard to write programs no matter how correct
             | because the compiler keeps crashing -\\_(tsu)_/-
        
               | wredue wrote:
               | What are you talking about? The tagged versions are
               | usually quite stable and also plan bug fix follow ups.
               | 
               | If you follow master, you'll occasionally run in to
               | crashes, which is true of any developing language. If you
               | don't want that, follow tagged versions.
        
             | ksec wrote:
             | >instead of making it hard to write incorrect programs, Zig
             | makes it easy to write correct programs.
             | 
             | Or May be rephrasing it ( To avoid the word "instead" which
             | may anger Rust supporters );
             | 
             |  _Rust Makes it hard to write incorrect programs, Zig makes
             | it easy to write correct programs._
             | 
             | I think this single sentence captures the philosophical
             | difference between Rust and Zig. And of course there is no
             | right or wrong in philosophy.
        
               | dgb23 wrote:
               | I fully agree. It's an interesting trade off that is
               | worth thinking about.
        
             | bhouston wrote:
             | Here is a really good link that shows the power of Zig's
             | comptime:
             | 
             | https://kristoff.it/blog/what-is-zig-comptime/
        
             | chrismorgan wrote:
             | I am certainly sometimes envious of comptime and what it
             | makes practical, but it's worth noting that it results in
             | dynamically-typed generics, whereas Rust goes for
             | statically-typed generics, which is in keeping with its
             | goals. There are some significant general maintainability
             | improvements in statically-typed generics; when you use
             | dynamic generics, subtle changes in one place can cause
             | obscure compile errors in a completely different and
             | seemingly unrelated place (commonly called _post-
             | monomorphisation errors_ ); this doesn't happen with static
             | generics.
             | 
             | So... I'm not sold on your wording that it's
             | _circumventing_ issues, as it's choosing a different set of
             | trade-offs. In shedding types-are-a-language-of-their-own,
             | you also shed confidence about what's a breaking change,
             | and make call sites more fragile. Decide for yourself
             | whether it's worth it.
        
               | anonymoushn wrote:
               | What do you mean by dynamically-typed generics and
               | statically typed generics here? I've looked up "post-
               | monomorphization errors" and found some things about
               | assertions about generic types failing because of the
               | choices made by the user who passed in types or constants
               | that do not work with the generic code. It seems like Zig
               | libraries have the option of generating the errors at the
               | right place if they place their assertions in the
               | function that returns the type to the user, but they also
               | have the option of generating the errors in the wrong
               | place if they place their assertions in methods of the
               | type.
               | 
               | > So... I'm not sold on your wording that it's
               | circumventing issues, as it's choosing a different set of
               | trade-offs. In shedding types-are-a-language-of-their-
               | own, you also shed confidence about what's a breaking
               | change, and make call sites more fragile. Decide for
               | yourself whether it's worth it.
               | 
               | Client code can just look at all the members of all the
               | structs so there's not really much hope for enforcing
               | that changes cannot break any client code using compiler-
               | adjacent tooling.
        
               | dgb23 wrote:
               | I'm still in the honeymoon phase with this language and
               | learning, but I agree it's a trade off.
               | 
               | For example your LSP isn't going to help you as much
               | while you edit code.
               | 
               | However being able to express arbitrary compile time
               | constraints and pre-compute stuff without having to go
               | through a code generation tool is really powerful. You
               | can actually use all the knowledge you have ahead of time
               | as long as you can express it in Zig.
               | 
               | So far it seems like Zig is carving out a very strong
               | niche for itself.
        
               | convolvatron wrote:
               | is there anything fundamental about using the same
               | language at compile time to generate fully static typing
               | that is boiled away? I don't know, but it doesn't seem
               | so?
        
         | jjtheblunt wrote:
         | Can you show how you'd implement a Turing machine in Rust
         | types?
         | 
         | I am not seeing it, so am likely overlooking that.
        
         | justinpombrio wrote:
         | Agreed. The strong agree is that Rust's type system---and many
         | other languages' type systems---is a parallel language to its
         | runtime language. Parameterized types are like functions;
         | instantiating a type parameter is like a function call; `impl
         | ... where` does pattern matching. You get a pure functional
         | language based on pattern matching and implemented with
         | memoization. These type systems are often Turing complete _on
         | purpose_ , but yet it's surprisingly hard to get an infinite
         | loop. Like, both Rust's and Java's type systems are Turing
         | complete, yet I've never hit an infinite compilation loop in
         | either by accident.
         | 
         | For more on type systems as programming languages:
         | https://ductile.systems/oxidizing-the-technical-interview/ plus
         | its links at the top.
         | 
         | What I would like to see is a programming language where the
         | runtime language and the comptime language are the same, or
         | nearly the same, _and where the comptime language is type
         | safe_. Zig isn 't this: its `comptime` language is essentially
         | dynamically typed. In Zig, if a function `f` takes a `comptime`
         | argument `t: Type`, it can call `.print()` on it, and the
         | compiler will just assume that that's fine. If you call `f`,
         | you had better read the docs because the _type system_ won 't
         | tell you that `t` needs to have an `print()` method. If you're
         | calling `f` directly, that's pretty straightforward, but the
         | trouble comes when you actually call `h`, and `h` looks at the
         | value of some string and based on that decides to call `g`, and
         | `g` calls `f`, and the docs for `h` weren't entirely clear, so
         | now you're seeing an error in `h`, which you didn't even call.
         | Instead, if `f` is going to call `.print()` on a `t`, then its
         | argument `t` can't just be a `Type`, the compiler should check
         | that it's a `Type with method .print()->String`. This
         | requirement would then flow to `g` and `h`, so the type
         | signature for `h` is guaranteed to tell you that `t` is
         | required to have an `print()` method.
         | 
         | For more on merging runtime and comptime languages in a type
         | safe way, see 1ML: https://people.mpi-sws.org/~rossberg/1ml/
         | 
         | EDIT: Deleting my criticism of C++ templates lest it distract
         | from the more substantial things I had to say above.
        
           | gpderetta wrote:
           | > What I would like to see is a programming language where
           | the runtime language and the comptime language are the same,
           | or nearly the same, and where the comptime language is type
           | safe
           | 
           | I have aproximately 0 knowledge of it, but I think
           | TemplateHaskell should do that.
        
             | justinpombrio wrote:
             | Nope. Haskell's type system guarantees that code you
             | construct in Template Haskell is well-formed, but the code
             | you construct is only type checked once, at the end. So if
             | you have a function that constructs code, and the code it
             | constructs has a type error in it, you won't find out
             | unless you call the function. Just like Zig.
        
           | avgcorrection wrote:
           | > What I would like to see is a programming language where
           | the runtime language and the comptime language are the same,
           | or nearly the same, and where the comptime language is type
           | safe.
           | 
           | MetaOCaml might the closest one.
        
           | zozbot234 wrote:
           | > and where the comptime language is type safe
           | 
           | You need dependent types in order to do this, which means
           | doing away with Turing-completeness. (Moreover, the principle
           | 'Type is of type Type' as found in Zig comptime leads to
           | type-unsafety. So you need to replace that with some notion
           | of universes.)
        
             | justinpombrio wrote:
             | Well, you can have dependent types but limit their
             | expressiveness. Rust, for example, has dependent (comptime)
             | types, but they're very extremely limited:
             | fn foo<const N: usize>() -> [f32; N]
             | 
             | I imagine having arbitrary comptime code, but more limited
             | use of comptime values in type position.
             | 
             | Also, do you know the exact issue with "Type is of type
             | Type"? I know that can lead to _non-termination_, and non-
             | termination completely breaks proof assistants. For
             | example, you can prove `False` with:                   fn
             | make_false() -> False { return make_false(); }
             | 
             | But if you're not building a proof assistant, a function
             | like `make_false()` is fine. Does it lead to any additional
             | problems?
        
         | throwaway894345 wrote:
         | I think there's still value in pushing people towards something
         | declarative. For example, a static type system might have an
         | `any` type for things that aren't expressible statically but
         | that doesn't mean type system designers should embrace dynamic
         | typing from the start instead of trying to make it static.
         | Similarly, it makes sense to have panics/exceptions in a
         | language as an escape hatch for errors where the application
         | can't reasonably do anything gracefully, but it's (arguably) a
         | bad experience to take that to the extreme and panic for _all_
         | errors.
         | 
         | I think there's value in _accounting for the possibility_ that
         | there will be edge cases that preclude hard-and-fast rules like
         | "purely static" or "purely declarative", but I dislike the
         | philosophy of projecting the 1% use case (e.g., "dynamic" or
         | "turing complete") onto the 99% use case (where e.g., static
         | and declarative would be ideal). I like when languages design
         | for the 99% case and allow for escape hatches for the remaining
         | 1% with the understanding that these escape hatches are
         | intended to be used judiciously.
         | 
         | To put it differently, embrace that there may be escape hatches
         | in the initial design, but prefer to think of them as "escape
         | hatches" with the entailed understanding that they should be
         | rarely used.
        
         | haberman wrote:
         | 100 times this. C++ is progressively making more and more of
         | the language and stdlib available at compile time (constexpr,
         | consteval, constinit). Once this is accomplished, there will be
         | _two_ very complicated Turing-complete programming languages
         | available at compile time, one that is convenient to program
         | (C++) and one that is not (C++ templates).
         | 
         | Zig jumps straight to the finish line by making the main
         | language available at compile time out of the gate, and by
         | using it as the "generics language", Zig's generics are both
         | simpler and more powerful.
        
       | fweimer wrote:
       | How does this AoVA data structure work in practice? Don't you
       | lose index-based access, in the sense that arithmetic on indices
       | is potentially no longer meaningful in the array sense? Iteration
       | will not preserve insertion order.
       | 
       | I think TLV (tag-length-value; length can be implied by the tag)
       | is more commonly used in this context due to its better caching
       | properties, and it least it provides meaningful forward
       | iteration. See getdents/inotify/Netlink messaging.
        
         | LegionMammal978 wrote:
         | > How does this AoVA data structure work in practice? Don't you
         | lose index-based access, in the sense that arithmetic on
         | indices is potentially no longer meaningful in the array sense?
         | Iteration will not preserve insertion order.
         | 
         | The caption to figure 4 suggests that the AoVA pattern is not a
         | good fit if you need to retain the full order of inserted
         | elements:
         | 
         | > Compared to the SoA layout from before, we have a partial
         | order instead of a total order. So upon insertion, we get back
         | a tagged index that holds both the enum tag and the index in
         | the particular variant array.
         | 
         | So by my reading, ordered access is considered out of scope
         | here. (I suppose you could recover ordered iteration by storing
         | a global index in each element, but that still wouldn't help
         | with ordered random access, and it would likely lead to some
         | really branchy code.)
        
         | catgary wrote:
         | Yeah I imagine an array write that changes the type of an index
         | would be insanely expensive...
        
           | Aransentin wrote:
           | In practice, changing tag type is rarely done.
           | 
           | In any case it wouldn't be particularly expensive; you'd have
           | to make a new member of the destination type which is as
           | expensive as an append. The "hole" that remains in the
           | original vector can be filled in constant time by taking the
           | last member of it and moving it there, then shrinking its
           | size by 1.
        
             | MobiusHorizons wrote:
             | I'm sure there are classes of problem where that's not
             | unusual. Even in the stated example of an AST, there are
             | use-cases for code-fixers (eg lint fixers) that operate on
             | the AST.
        
               | dan-robertson wrote:
               | Don't those things usually work on concrete syntax trees?
               | And shouldn't the solution be to use the right data
               | structure for the job (using language feature to make
               | this easy) rather than pushing one system (e.g. the
               | compiler) to be worse for the sake of another tool (the
               | linter)?
        
         | dylukes wrote:
         | AoVA lacking its own total ordering on indices might be a
         | problem for some use cases, but not for the one that is
         | suggested here: AST nodes.
         | 
         | In these cases, the arrays can be just one component (could
         | think of it as an arena) of a heap-ish structure. [1]
         | 
         | The cost is that your indices now need to be two dimensional
         | (tag_idx, va_for_tag_idx). But the number of tags is known at
         | compile time and you can optimize storage by packing so that
         | tag_idx is the upper 4-5 bits and va_for_tag_idx uses the rest.
         | 
         | See: [1]
         | https://www.cs.cornell.edu/~asampson/blog/flattening.html
        
         | skybrian wrote:
         | > upon insertion, we get back a tagged index that holds both
         | the enum tag and the index in the particular variant array.
         | 
         | These are essentially pointers. If you want to iterate, you
         | store the pointers in an array in the order you want to use.
         | It's the same thing a program would do if it allocated memory
         | from a heap.
         | 
         | Storing things based on their size is also done by garbage
         | collectors and general-purpose allocators. They might get some
         | efficiency from knowing all possible object sizes, though.
         | Also, like an arena, they could gain some efficiency from
         | having a simpler way of deallocating.
        
       | wyldfire wrote:
       | > In it he describes how a parsed clang AST regularly consumes
       | 50x more memory than the original source code!
       | 
       | I guess that seems like a lot but the context I'm missing is -
       | how good could it get? I mean, if you want to preserve source
       | locations of each token and you need to encode enough information
       | in the AST to properly recover, what's the ideal growth factor
       | over the original? 1.5x? 15x?
        
         | anonymoushn wrote:
         | As a point of comparison, I think simdjson tape is only ~3x
         | larger than the original document. Much of this could be saved
         | if one did things like stuffing numbers into only one tape slot
         | or not copying strings that have no escape sequences in them
         | (instead referencing their location in the original document).
         | 
         | The largest overhead you could achieve is apparently 8x for a
         | document that is mostly the characters "[]" or also 8x for a
         | document that is mostly the characters "0,"
        
           | halayli wrote:
           | There's a ton more context that needs to be held in a clang
           | AST node compared to a JSON doc. Just to mention a few: file
           | source, token range, expr/stmt/decl statements and their
           | relation, macro expansion related info, and tons of diag to
           | be able to walk back to the original source token, and
           | padding added by the compiler in those data structures.
        
         | cryptonector wrote:
         | If you can achieve a memory savings of 30%, say, that would be
         | pretty big news. If you can only achieve a 30% memory savings
         | by doing things to the compiler that make it harder to work on
         | the compiler in the future, then it's probably not worthwhile.
         | But if you can achieve an 80% memory savings by doing violence
         | to the compiler, that might be worth the trouble.
         | 
         | As for what's an ideal source->AST expansion factor for a
         | language that has a user-friendly compiler that's also compiler
         | dev-friendly, that's hard to say. Clearly 50x is workable.
         | 
         | In TFA the 50x expansion factor is used as a motivator for
         | automating a particular type of optimization. It'd be very
         | interesting to see a Rust vec-of-enums that automatically
         | deconstructs enum values into tags and opaque values that it
         | could store in a struct-of-arrays like TFA does in Zig. The
         | places to bury unsafe use into for this would not be many.
        
           | fnord123 wrote:
           | Maybe I'm just dumb but isn't the endgame of vec of enums
           | going to be an ECS?
        
         | [deleted]
        
         | dan-robertson wrote:
         | I think you should just watch the linked talk. It's excellent.
         | Though they don't put exact numbers on it iirc (perhaps too
         | early to be confident - they might be small due to missing data
         | they don't yet realise they'll need).
        
       | zoogeny wrote:
       | As an aside and unrelated to the content of the article ... but
       | the form of the article, with footnotes in the margin, is really
       | useful.
        
         | dist1ll wrote:
         | Thanks! It's inspired by this
         | https://edwardtufte.github.io/tufte-css/
        
           | ammar2 wrote:
           | Another quick question for you, what are you using to make
           | those hand-drawn looking diagrams? They look amazing.
        
             | dist1ll wrote:
             | I'm using Excalidraw :)
             | 
             | [0] https://excalidraw.com/ [1]
             | https://github.com/excalidraw/excalidraw
        
           | ghosty141 wrote:
           | Thanks for sharing this! Looks fantastic, I'll definitely
           | incorporate this into my own blog once it's finished.
        
             | chrismorgan wrote:
             | If you're interested in the general concept of sidenotes
             | and labelling figures in the margin, here's my take on it
             | for further inspiration:
             | https://chrismorgan.info/blog/2019-website/, with
             | https://chrismorgan.info/blog/rust-fizzbuzz/ exhibiting
             | figures.
             | 
             | Do search around for other sidenote implementations, too. I
             | like the markup and presentation I end up with, but you'll
             | also find other approaches to mobile support in particular,
             | involving things like checkbox hacks to toggle visibility,
             | or just flat-out requiring JavaScript.
        
         | chrismorgan wrote:
         | Generally speaking: sidenotes are excellent, footnotes are
         | fine, endnotes are terrible. On the web, as a typically-not-
         | paged medium, "footnotes" practically always actually means
         | endnotes.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-09-18 23:01 UTC)