[HN Gopher] Memory-efficient enum arrays in Zig
___________________________________________________________________
Memory-efficient enum arrays in Zig
Author : dist1ll
Score : 297 points
Date : 2023-09-18 12:00 UTC (11 hours ago)
(HTM) web link (alic.dev)
(TXT) w3m dump (alic.dev)
| cmrdporcupine wrote:
| In general I'm maybe somewhat disappointed that pattern matching
| in Rust can't be expressed more as a type system trait-type thing
| (hand waving here) that any struct could conform to, rather than
| an explicit first class hardcoded object type with its own
| storage structuring, etc?
|
| I may not be expressing this cogently, but. I've implemented both
| an AST (as in this article) and an opcode/bytecode interpreter
| recently, and I feel that in large part Rust's enums are not
| ideal for either:
|
| I never had the memory usage concerns about the AST that this
| author did, but I wanted to do things like: add a line number /
| column attribute to every statement node. I had a `Stmt` enum,
| and I had a choice: put line # and column on every enum
| possibility (boilerplate and ugly), or put the enum in a new Stmt
| struct that contains both the original enum and the line/col
| attributes -- which involved a bunch of refactoring and also
| didn't feel elegant. I feel there must be a more elegant type
| system way that my "Stmt" struct could have been reworked, that
| Rust isn't offering me.
|
| And re: the opcode stuff,I am fairly sure that a pattern-matched
| Rust enum is not an ideal encoding for a VM opcode interpreter
| performance wise. But the language really does want to point you
| in this direction, and the facilities for de-structuring patterns
| are very seductive. I haven't gotten to yak shaving this yet, but
| again I feel like there's likely some potential for improvements
| in the type system that could open things up so one could get
| pattern matching facilities while using one's own preferred lower
| level impl.
|
| I dunno. Thoughts.
| aw1621107 wrote:
| > pattern matching in Rust can't be expressed more as a type
| system trait-type thing (hand waving here) that any struct
| could conform to, rather than an explicit first class hardcoded
| object type with its own storage structuring, etc?
|
| Do you think you can provide a more concrete example? First
| thing that comes to mind for me is some kind of structural
| typing [0], but I'm not confident I'm interpreting your comment
| correctly.
|
| [0]: https://en.wikipedia.org/wiki/Structural_type_system
| cmrdporcupine wrote:
| Other people who are better with these things could probably
| conceptualize it better than me.
|
| What I'd like to be able to do is declare a trait that says
| "this thing can yield the following patterns, can destructure
| in this pattern, and these are the functions you'd call to
| extract references to those values"
|
| And then be able to use match on it as usual.
|
| Putting it another way, I wish an algebraic data type "enum"
| was something any struct could "be", with the type system
| providing the means to express that that's the case, and the
| programmer being able to provide the "how" and with existing
| Rust enums merely being the default way of doing this.
| dan-robertson wrote:
| Haskell and F# have this feature. In F# it is called
| _active patterns_. In Haskell the feature is a _pattern
| synonyms_ (maybe combined with view patterns) and is used
| to e.g. pattern match on the prefix or suffix of a
| sequence-like structure, or to match a tree as a an element
| and left /right node without the caller seeing a bunch of
| other tree metadata.
| aw1621107 wrote:
| Interesting idea. I feel a bit conflicted about it, but
| it's probably something I'd have to think about.
|
| I suppose it's possible to somewhat approximate this in
| existing Rust by creating a trait with a function that
| returns a destructure-able type, performing the
| transformations that you would have implemented in your
| hypothetical trait, then pattern match over that. Not sure
| if that would be "good enough".
| patrec wrote:
| Several languages have what I think you are asking for. Have a
| look at Scala's extractors or F#'s active views.
| foderking wrote:
| you mean f# active patterns?
| [deleted]
| eximius wrote:
| I do with proc macros could evolve to be able to query the
| compiler for information (which would necessitate careful thought
| to adding stages to compiling). I've wanted to know "does this
| struct impl this trait" or "give me a list of all concretely
| implemented traits" or various things like this that would be
| very, very useful in a proc macro.
| Arnavion wrote:
| IIRC the compiler runs plugins in two stages. The first stage
| gets the AST before any typechecking and it can make
| modifications to the AST; macros and some clippy lints run in
| this stage. The second stage runs after typechecking so it gets
| type information, but it cannot make any modifications; other
| clippy lints run in this stage.
| npigrounet wrote:
| [dead]
| airstrike wrote:
| I only partially understand this article but this seems
| incredibly relevant to me since I'm planning on writing a
| spreadsheet engine in rust. Cell values require me to have
| something like (this very poorly written POC code, sorry):
| pub enum Expr { Number(i32),
| String(String), Reference(Position),
| Binary(Box<Expr>, char, Box<Expr>), Function(String,
| Vec<Expr>), }
|
| I'll keep reading and studying! If anyone has more resources to
| throw at me, I'm all ears
| nhatcher wrote:
| Hey, I've actually build a spreadsheet engine in Rust. Not open
| sourced, but could give you some pointers. You are going to hit
| lots of performance issues before you can have some gains using
| the methods of the article. The single most difficult problem
| is the evaluation strategy.
| airstrike wrote:
| Hey, thanks for the offer. I have been thinking about
| evaluation and the extent of my strategy so far has been to
| consider it as a dependency graph, which led me to this
| interesting article:
| https://hyperformula.handsontable.com/guide/dependency-
| graph...
|
| Also it sounds like the right next step for me is to read
| more on incremental computation. Someone on reddit just
| pointed me to adapton:
| https://docs.rs/adapton/latest/adapton/
|
| Actually implementing anything with Rust is on my TODO list
| (I'm spending some time on the UI prototype right now)
|
| I will give a couple things a try and then, if you don't
| mind, I can ping you via email. I was checking out your
| website and loved the read on the parser -- not to mention A
| MAZE (played all the way to level 4!)
| dan-robertson wrote:
| (just jumping in here) You may want to take a look at
| https://www.scattered-thoughts.net/writing/an-opinionated-
| ma... which describes things on the spreadsheety and not-
| very-spreadsheety ends of the spectrum (moreso the latter).
| If you follow some of the links there you should find some
| good descriptions of various implementation strategies,
| e.g. Naiad style timely dataflow or incremental-style
| pseudotime.
| nhatcher wrote:
| Wow! Glad you had a look around. Yes, please send me an
| email whenever you feel like.
|
| And yes, you are going to need some sort of "support graph"
| and topological sorting. But this is way more complicated
| than it looks like. Most functions in a spreadsheet don't
| know it's dependencies until you are actually computing
| them, like `LOOKUP` or `IF`. Some functions like
| `RANDBETWEEN` or `NOW` change their value every time. In
| some modern spreadsheets some functions like `SORT` or
| `UNIQUE` spill their values to some other cells.
| Calculating dependencies correctly in all those situations
| can be challenging.
|
| There is not a lot of literature on spreadsheet
| implementation and reading code of existing open source
| implementations is hard. And exception is Peter Sesoft:
|
| https://mitpress.mit.edu/9780262526647/spreadsheet-
| implement... and https://www.itu.dk/people/sestoft/funcalc/
|
| My recommendation would be to first set the scope. Then
| code manually the lexer and parser and drawn those two in
| tests. Excel formulas can be quite complicated! There are
| lots of fancy things you can do there like use arena
| allocators and other optimizations. I would try to avoid
| that on a first go.
|
| Here is something that I have found extremely convenient.
| Excel saves the cell formula AND the resulting value.
| Meaning that to create a test case you just need to create
| an Excel workbook (and a xlsx parser, of course).
|
| Doing the UI for a spreadsheet is an equally challenging
| problem.
|
| Good luck!!!!
|
| Oh, and remember this thread from a couple of days ago.
| Might be inspiring:
| https://news.ycombinator.com/item?id=37527720
| hathawsh wrote:
| Sounds like a fun project. If it's intended for ordinary users,
| one thing you should expect them to do is add content to the 4
| extreme corners of a sheet and see if the spreadsheet engine
| falls over as a result. If you allow 1 million by 1 million
| cells, and you store a "null" for every cell not filled in,
| then the engine will run out of memory. That means you might
| consider storing the cell content in a sparse manner. One way
| to do that is with a hash map implementation such as
| "hashbrown". Just a thought.
|
| (To clarify why I'm bringing this up, the points in this
| article are low level details that you probably don't need to
| think about if you start with a hash map and thereby avoid
| running into memory constraints initially.)
| airstrike wrote:
| Thanks, great point indeed. I am looking into this
| https://github.com/rust-lang/hashbrown
|
| The way I think about it -- rather naively, I suppose -- is
| that I care more about the references cells make to each
| other than the actual grid of cells displayed on a table. The
| latter feels more like a "view" of the data than an actual
| data structure?
|
| This also seems to align with the relative priority of
| (sorted from highest to lowest): figuring out the order of
| evaluation, calculating those evaluations, and finally
| displaying the results of the evaluation
| vlovich123 wrote:
| The problem being highlighted is that if you have widely
| disparate sizes of the variants and/or a lot of these in an
| array, your hurting performance because there's a lot of space
| being wasted on padding.
|
| A common technique that came from gaming is that if you have an
| array of structs (AoS) (eg array<struct { health: int, ammo:
| Ammo, ... }> to break it up into a struct of arrays (SoA):
| struct Humans { healths: Vec<int>, ammo: Vec<ammo>, ... }. Then
| the ith index into those vecs is the ith Human in the AoS
| layout (parallel vecs here are for example, but it's not
| optimally efficient because the bookkeeping of length and
| capacity on a per field basis is wasteful as that info is
| duplicate).
|
| This is basically a similar idea except for enums and doing it
| automatically in a way that Rust can't. It's possible the
| problem is a bit overrated in terms of how big of a problem
| this is. Also, in Rust you may be able to wrap the third party
| type in your own and then the derive macro approach probably
| can work.
|
| Re your spreadsheet, just keep in mind of this as a possible
| optimization and you have to first figure out if you're
| building for speed (ie you've done this a few times and you
| know this is probably a bottleneck) or experience (ie build for
| simplicity and understanding)
| airstrike wrote:
| Thank you -- definitely building it for speed which is why
| this felt topical. I really appreciate your explanation, it's
| super clear.
| rudedogg wrote:
| On the subject of AoS, Zig has a cool data structure for
| that: https://zig.news/kristoff/struct-of-arrays-soa-in-zig-
| easy-i...
| sdfghswe wrote:
| More or less OT.
|
| I keep hearing about both Zig and D as interesting C-like low
| level languages with better satefies/ergonomics. I wonder if
| someone who's familiar with both would like to discuss the main
| differences...?
| esjeon wrote:
| D is a GC-based C++/Java-like OOP language. People used to call
| it "a C++ done right", but it was never simple and didn't have
| any clear edges against its competitors. TBH, one simply should
| not build on top of C++ - it's a beast that no one can
| comprehend, and it only keeps growing.
|
| In case of Zig, I think it's a very good modern C alternative.
| Zig can do what C can do, can interoperate w/ C (e.g. importing
| C headers), and offers new features that are useful (comptime,
| slices, error type). It's like Go, but without GC and a large
| runtime.
| sdfghswe wrote:
| Oh many I was loving your description of Zig up until when
| you said "it's like Go"...
| Guvante wrote:
| I haven't heard of Zig referred to as like Go except in the
| concept of trying to replace C.
|
| (Rust is more aiming for C++)
| ksec wrote:
| Well WalterBright could have answered that. But since HN has
| started taking a piss out of WalterBright replying in every
| single Zig ( or Rust ) threads he stopped commenting on most if
| not all Programming languages topics. Sigh.
| saagarjha wrote:
| There are productive ways to comment on threads without being
| a shill for your own thing.
| Sharlin wrote:
| Zig is C-like. D is C++-like. Zig has first-class compile-time
| metaprogramming. D is a kitchen sink language; it has _many_
| more features than even C++ does. Zig is minimalistic and
| strictly imperative rather than multi-paradigm. D has a cult
| following but hasn 't really taken off. Zig is a _relative_
| newcomer trying to prove itself.
| systems wrote:
| i think the term "D has a cult following" is a bit harsh D's
| community is small, that is it, nothing cult like, they dont
| over defend D most of the community seem to be happy using
| other languages etc ..
|
| but i agree with D being a kitchen sink language, trying to
| be many things, but never being the absolute best at
| anything, which i think is why it never really took off
|
| i am really hopeful for Zig, trying to be a more pragmatic C
| replacement
| Semaphor wrote:
| > i think the term "D has a cult following" is a bit harsh
| D's community is small, that is it, nothing cult like
|
| I think "cult following" is used here like you would for
| movies, it's not actually cult-like, it's more that it's a
| rather small, close-knit community.
| pron wrote:
| > Zig is C-like
|
| Only in the sense that the language is simple and small, but
| C pays for that by being inexpressive (you can find families
| of C++ programs for which an equivalent C program would
| require exponentially more code). Zig, on the other hand, is
| small and simple while being just as expressive as C++. It
| achieves that through a remarkable observation: many
| primitive features can be avoided with a powerful enough
| compile-time computation mechanism with introspection.
|
| Zig is so revolutionary in its design that it's hard to
| compare it to anything else. Or, put another way, any
| comparison is sure to miss out on some very foundational
| piece. So yes, in some sense Zig is like C (simple, low-
| level), but it's as expressive as C++. In another sense, Zig
| is like Scheme (minimalistic elegance), and yet it feels
| familiar and it does not rely on macros for its core power.
| In other ways it's like Go (easy to learn, easy cross-
| compilation), but it gives full control and is more flexible.
|
| At this point, Zig is sui generis.
| 1980phipsi wrote:
| > Zig is so revolutionary in its design that it's hard to
| compare it to anything else.
|
| I mean, Zig has an interesting combination of
| functionality, but I wouldn't say that the individual
| pieces are revolutionary. Who knows though, the whole could
| be more revolutionary than the sum of it's parts...
| ngrilly wrote:
| Zig's take on comptime is quite new (which is the main
| point of the article being discussed). Error handling and
| async/await as well.
| 1980phipsi wrote:
| Zig's take on comptime is slightly different from C++ and
| D, but that doesn't make it revolutionary. It has first
| class types, which allows you to pass them to a function
| along with normal function parameters, instead of having
| a separate syntax for it like C++ and D. But it's not
| really new.
| pron wrote:
| That's a little like saying that smartphones weren't
| really new because we had mobile phones and calculators
| and cameras and video cameras and handheld games before.
|
| Zig's comptime is revolutionary and really new in that it
| replaces many other separate features and is the basis of
| Zig's minimalism. Figuring out how to discard features is
| much more difficult than adding them, and simplicity is
| something that's impossible to add once it's lost. The
| message is not the existence of a rather general compile
| time execution, but that with a particular design of such
| a mechanism you can do away with other things.
| pron wrote:
| To say something truly original you don't have to invent
| new words.
| Sharlin wrote:
| Yeah, I had written something like "Zig is pretty much
| designed around the comptime system" at first, should've
| probably kept it :D
| jandrewrogers wrote:
| The unapologetic embrace of compile-time metaprogramming in
| a clean model drives most of my interest in Zig. I use
| metaprogramming heavily in C++ to dramatically reduce code
| size and increase type safety. Contrary to popular
| assumption, in C++20 metaprogramming is relatively concise,
| tidy, and maintainable, unlike the eldritch horror that was
| metaprogramming in C++11. But I don't think anyone would
| describe it as elegant or designed. I haven't had time yet
| to try porting C++ metaprogramming to Zig in order to see
| what is possible. Most other systems-y languages have a
| much weaker metaprogramming story than C++.
|
| Many developers underestimate the importance of
| metaprogramming in systems languages for ensuring quality
| and security. Defects scale proportional to the lines of
| code. Strong metaprogramming facilities can reduce the
| lines of code required several-fold.
| gpderetta wrote:
| C++11 gave us variadics which opened a new
| metaprogramming world. For true horror try cons-based
| typelists in C++98.
| 1980phipsi wrote:
| I agree with most of this, but I would say that D was
| probably designed to be an alternative to C++ or Java. Zig
| shares many of C's design philosophies, though I think D's
| syntax is a little closer to C's than Zig's is to C's.
|
| D also has a betterC mode that removes some of those "kitchen
| sink" features, but I couldn't say how popular it's use is.
| Many people who use D like some of those features. Zig's more
| minimalistic approach to language features is closer to C's
| approach.
|
| Zig likely has been influenced by D (among other languages),
| but I wouldn't be surprised if D's new importC functionality
| has been influenced by Zig.
|
| I don't use D (or C or C++) at work, but learning D and using
| it at home has made me a much better programmer in the
| languages I use for work. Maybe I would have learned those
| lessons with C or C++, but I guess I took more easily to D
| than them.
| Sharlin wrote:
| > I agree with most of this, but I would say that D was
| probably designed to be an alternative to C++ or Java
|
| You're agreeing with me then! D was pretty explicitly
| created due to WalterBright's (incidentally a prolific
| 60k+-karma HN user!), frustration with C++ and presumably
| the problems that stem from its requirement to maintain
| almost a 100% C compatibility. Bright is the founder of
| Digital Mars, a small company developing C and C++ (and D)
| compilers, so he has first-hand experience. Since 2007 D
| has been co-developed by Andrei Alexandrescu, a well-known
| C++ expert and one of the pioneers of template
| metaprogramming.
| [deleted]
| optymizer wrote:
| Having written a few toy compilers for C, to me D is the most
| elegant of the C families of languages.
|
| For example, consider the versatility of the '.' operator. It
| does pointer dereference, static field access and
| class/namespace member/method access. In all those cases I want
| to access "a thing from this other thing". Having 3 operators
| in a language (., ->, ::) is easier for a compiler writer but
| it puts more cognitive load on the developer.
|
| Now consider type extensions. As a language designer you could
| think: "if a developer wishes to add methods to a type Vector
| that exists in a third party library, then in their project,
| the developer would use some special syntax" like maybe:
| class Vector { void json() { ... } }
|
| and maybe the developer also needs to have it even repeated
| across a header file and an implementation, but... why? It's an
| idea, but it's not a law of the Universe. There is a better
| way.
|
| Instead, the compiler can help you out. Just declare a regular
| function anywhere like so: void json(Vector
| v) { ... }
|
| which is not special in anyway, and then be free to invoke it
| as: json(v);
|
| or v.json();
|
| Why if it isn't our familiar dot operator! It also does type
| extensions via some syntax sugar? What an elegant solution.
|
| This is just one feature of D, and it's filled with features.
| Though I don't really use D in my day to day work, it's
| wonderful to see how the authors thought about hard problems in
| language design and implemented elegant solutions. I think D
| deserves more recognition for the things it absolutely hit out
| of the park as a general purpose programming language.
| Joker_vD wrote:
| Wait, the thing the article calls "array of variant arrays (or
| AoVA)" is _not_ the canonical way to decompose tagged unions?
| Seriously? [AxBxC] [?] [A]x[B]x[C], that 's the most obvious
| thing, isn't it?
|
| Edit: Actually, never mind, as the comment points out, the
| correct equation should be [A+B+C] [?] [A]x[B]x[C] and it's not
| obvious at all whether it's actually correct.
| pornel wrote:
| Things like that can be done by specific collections (e.g. most
| ECS implementations store components per "archetype").
|
| But from language perspective the limiting factor is that
| you're always allowed to take a reference to any instance of an
| enum. This means the tag must be stored in every instance. If
| you allow mutable references to enums, you must have enough
| space for writing any variant there through the reference.
| ratmice wrote:
| Don't you mean something like [A+B+C] [?] [A]x[B]x[C]
| kccqzy wrote:
| Of course not: you are mistaking sum types and product types.
|
| Even if we assume they are product types, [AxBxC] [?]
| [A]x[B]x[C] is still not correct. The former doesn't allow the
| number of As and Bs and Cs to differ: the latter does. So the
| latter strictly speaking admits more values than the former.
|
| Since both of them allow infinite number of values, to make a
| size comparison we will use generating functions. A list of A
| (or [A] in your notation) allows an empty list (one possible
| value), a one-element list (as many values as A itself), a two-
| element list... which becomes 1+A+A^2+...=1/(1-A). The
| beautiful thing about calling them sum types or products types
| is that you can manipulate it just by summing or multiplying
| respectively.
|
| So the number of values for [AxBxC] is identified by 1/(1-ABC).
| For [A]*[B]*[C] it's 1/(1-A)/(1-B)/(1-B) which simplifies to
| 1/(1-A-B-C+AB+AC+BC-ABC). Now it becomes obvious+ this form
| admits more values and you can in a sense quantify how many
| more!
|
| +: Okay perhaps it's only obvious if I also include a Venn
| diagram but diagramming is beyond what I can do on HN.
| Joker_vD wrote:
| Yeah, and if you carefully expand
| 1+(A+B+C)+(A+B+C)^2+(A+B+C)^3+..., there'll be strictly more
| summands than in
| (1+A+A^2+A^3+...)x(1+B+B^2+B^3+...)x(1+C+C^2+C^3+...) due to
| more flexible ordering of elements, so they're not
| isomorphic.
|
| I still think it's the most obvious thing that [A+B+C] at
| least maps surjectively onto [A]x[B]x[C] :)
| catgary wrote:
| It's a section/retract, definitely not an isomorphism. You'll
| need to carry around the index data, and any sort of array
| update will incur a crazy upfront cost.
| Joker_vD wrote:
| Eh, I imagine it should be possible to maintain an out-of-
| line skip-list or an array of tombstones or something to deal
| with deletion/tag mutation (although in my experience with
| ASTs that's not that common of a requirement); and allocation
| can be done quite effectively with the help of the good old
| .bss section: just grab couple of millions of every type of
| node at the program's startup, virtual memory is almost free.
| [deleted]
| kazinator wrote:
| You can solve this problem if a few clear lines of GNU C. We
| declare two sister types, one of which is packed:
| typedef struct taggedunion { unsigned char tag;
| union tu { uint64_t u64; uint32_t u32;
| uint8_t u8; } u; } taggedunion_t;
| typedef struct __attribute__((packed)) packed_taggedunion {
| unsigned char tag; union tu u; }
| packed_taggedunion_t;
|
| Then accesses to our dynamic array will convert between the
| packed stored representation and the unpacked one in a
| straightforward way: taggedunion_t
| array_get(packed_taggedunion_t *a, size_t i) {
| return (taggedunion_t){ a[i].tag, a[i].u }; }
| WhereIsTheTruth wrote:
| For any language to be labeled as a "better c" language, it
| must solve this problem, having better enums is a requirement
|
| It's actually one of the reason I initially gave Zig a try, not
| Comptime, it was Tagged Union
| cryptonector wrote:
| But it's C. The idea here is to have higher-level, type-safe
| languages do this transformation automatically.
| [deleted]
| [deleted]
| loeg wrote:
| Your packed union eliminates pure padding, yes, but it still
| costs 9 bytes to store a u8 value _and_ misaligns accesses for
| wider elements. These are both significant misses on "solving
| the problem." You also need to copy each element on access,
| which could be expensive for larger plain-old-data types, or
| especially for non-POD types.
| masfuerte wrote:
| But this still takes 9 bytes (at least) to store a tagged
| uint8_t.
| kazinator wrote:
| The article's approach requires 8 bytes, plus a separate
| array of 1 byte tags.
|
| The article reduces it to 4 bytes, plus an array of 1 byte
| tags, by dropping the requirement for a 64 bit member.
|
| If we have to grow the array, we have to do two
| reallocations.
|
| Caching is worsened: the tag and content are going to be in
| separate cache lines.
| masfuerte wrote:
| The final approach in the article is an array for each
| unique object size plus tagged indices/pointers. This would
| take one byte per uint8_t and doesn't suffer from the
| problems you mention, though it does have others. If memory
| pressure is your main problem it's a big win.
| BaculumMeumEst wrote:
| I was checking out gameboy emulators recently written in C++,
| Rust, and Zig that have seen recent development.
|
| I couldn't get the Zig projects to build, which is understandable
| since the language seems to have had a lot of breaking changes in
| its build system and the language in general (which is why it
| comes with a caveat not to use it in production yet).
|
| But given the sheer number of comments I've read over the years
| recommending Rust as a replacement for C++, I was more surprised
| to see that I couldn't really find a single Rust project that had
| correct timings (the frame rates were incorrect and audio
| distorted) and worked with most ROMs I tried.
|
| Meanwhile, the C++ projects I found built without issue or
| hassle, had correct timings, ran all the games I've tried, had
| lots of features, etc.
|
| What gives? Is emulation just not a good use case for Rust?
| 0xDEF wrote:
| Building large-scale C++ projects has become less of an issue
| because of CMake slowly becoming the default build tool.
| anymouse123456 wrote:
| An alternative opinion (that I happen to hold) is that the
| unbelievable pain and frustration of dealing with CMake has
| been at least one of the forces that have led to the Cambrian
| explosion of new programming languages and build systems in
| recent-ish years.
| ghosty141 wrote:
| While CMake is pretty annoying the problem from my
| experience are people using it incorrectly and doing stupid
| stuff that you can't change if you're using the project as
| a library.
|
| If every library would be properly "packaged" using CMake
| would be far less annoying or frustrating. Just recently we
| had the case where a 3rd party library author of a sizeable
| c++ library set -Werror in his library which wreaked havoc.
| [deleted]
| coldtea wrote:
| > _Meanwhile, the C++ projects I found built without issue or
| hassle, had correct timings, ran all the games I 've tried, had
| lots of features, etc. What gives?_
|
| Aside the fact of the C++ projects being from seasoned C++
| developers (which exist 3 decades now in huge quantities), not
| people starting with Rust as their first "native" language and
| using an emulators as a toy project? C++ exists for 3 decades
| more on top of Rust's entire life of 8 years since v1.0. And
| given the over time adoption distribution, most Rust devs are
| like 1-4 years old Rust users max.
|
| Or the fact that those C++ are probably much more mature, and
| with more manyears in them compared to the Rust ones, some even
| being major community projects?
| raincole wrote:
| > What gives?
|
| There are still more good C++ programmers than good Rust
| programmers.
| tmccrary55 wrote:
| This.
|
| Many people are still learning Rust and an emulator is
| probably a good way to get your hands dirty.
|
| I'm doing something similar with a compiler project and my
| code is gradually getting better and more idiomatic.
|
| But I'm still doing a lot of stupid things and getting the
| hang of it.
| slikrick wrote:
| it's not at all about "quality of programmer" and you
| should never reduce extremely vague arguments to that. it's
| literally about time for quality program to be made an
| completed.
|
| those C++ emulators have often been in development for 3-5x
| as long as a similar Rust one.
| rahkiin wrote:
| I recently found https://github.com/Rodrigodd/gameroy to be a
| very complete implementation
| BaculumMeumEst wrote:
| I did see that one, but it doesn't seem to build on Mac.
| hu3 wrote:
| Related: for all the flak GC gets, C# was used to produce two
| amazing emulators:
|
| 1) https://github.com/Ryujinx/Ryujinx
|
| 2) https://github.com/TASEmulators/BizHawk
| WhereIsTheTruth wrote:
| ryujinx doesn't use the GC for their sub-systems, they use a
| mix of manual memory allocations throught their allocators or
| RC
|
| For such project to succeed, you have to be able to optimize
| your memory allocation strategy
|
| If you just rely on the GC, you'll have bad surprises (why
| does my game stutter during gameplay?!), they already have
| multiple due to their double dip on the JIT, nothing prevents
| the JIT compiler to compile during your gameplay, causing
| massive stutters, just like how shaders compilation is an
| issue at runtime
| ghosty141 wrote:
| Personally I think C# is a great language to get stuff done.
| The big problem it has is the portability (especially when it
| comes to using it with a GUI).
| agentultra wrote:
| GC often gets more flak than is deserved.
|
| Often such flak ignores the differences between throughput
| and latency.
|
| For long, lived processes you'll end up writing some kind of
| garbage collection system.
| guipsp wrote:
| Fwiw, ryujinx uses a jit on the hot path, and most (but not
| all!) bizhawk cores are not actually c#.
| hu3 wrote:
| Ryujinx uses Microsoft's JIT .NET technology [1] which is
| best available for C# codebases. Nothing new here since it
| needs to translate Switch's ARM instructions to x86 machine
| code somehow.
|
| [1] https://devblogs.microsoft.com/dotnet/the-ryujit-
| transition-...
| jackmott42 wrote:
| C++ has an ages old tried and tested ecosystem for game stuff
| which probably makes a lot of rendering and audio tasks easier.
| It could take Rust years to catch up even supposing the
| language is inherently better.
| Shish2k wrote:
| As somebody who has written the same gameboy emulator in C++,
| Rust, and Zig (as well as C, Go, Nim, PHP, and Python) - I have
| yet to find a place where language affected emulation
| correctness.
|
| Gameboy audio is kind of a pain in the ass (at least compared
| to CPU, which is fairly easy, and GPU, which is easy to get
| "good enough" if you don't care about things like palette
| colours being swapped mid-scanline) - and some languages take
| more or less code to do the same thing (eg languages which
| allow one block of memory to be interpreted in several
| different ways concurrently will make the "interpret audio RAM
| as a bunch of registers" code much shorter with less copying) -
| but in my case at least, each one of my implementations
| actually has the same audio distortions, presumably because I'm
| misreading some part of the hardware spec :P
|
| https://github.com/shish/rosettaboy/
|
| (Also yes, the zig version is currently failing because every
| time I look at it the build system has had breaking changes...)
| BaculumMeumEst wrote:
| Oh that's so cool! Thanks for sharing. I skimmed a few of
| these and I'll definitely take a closer look later. I enjoyed
| your observations on each language as well.
|
| From a reader standpoint, I enjoy reading emulators written
| in C the most. It sucks when C is missing features for
| complex tasks, but emulation seems to fit nicely into what it
| can do without black magic fuckery. Of course as someone not
| super experienced with C, writing it feels like an endless
| sea of "oh god why is this segfaulting".
|
| Rust OTOH bothers me to look at because there are so many
| sigils and language doodads for functionality that seems like
| it should be really straightforward but whatever, I'm sure
| that's just because I've barely used it and can't understand
| it. I've been learning C++ recently because learning
| materials are usually tailored to it for stuff I'm interested
| in (vulkan currently), while ignoring a constant nagging
| feeling that I should stpo being curmudgeon and take a closer
| look at Rust.
|
| I've tried Zig but am not looking too hard atm for many of
| the reasons you mentioned in your repository, I'm hoping the
| language will stabilize at some point and that LLMs will help
| with some of the repetition/verbosity.
| kristoff_it wrote:
| wrt Zig, according to some it's a great fit for emulators, so
| maybe that's why there are a few :^)
|
| https://www.youtube.com/watch?v=jkkUS0nmdsg
| ilyt wrote:
| ...probably the fact most of the C/C++ project you'd find in
| emulation exist longer than Rust itself ?
|
| I'd imagine most of them in Rust were written for fun rather
| than trying to best the established ones.
| wiz21c wrote:
| I'm currently writing an emulator in Rust (for Apple 2). For
| some aspects I'm better than my C++ counterparts.... BUT those
| counterparts have _years_ and _years_ of little refinements
| that make a _huge_ difference on the fact that some games run
| or don 't. There's no question about that.
|
| Besides Rust is just fine. I use a bit of threads, a bit of
| OpenGl and none of these is a problem. I'd venture to say that
| the rust compiler is real good and that allows me to code
| things without worrying about optimisation. Older emulator have
| started years ago and they often had to optimize by hand which
| leads to not so readable code; just your usual code legacy
| story.
|
| Finally, the cargo tool is a modern way to build the code and
| I've been able to port my code to Linux, windows, MacOs and
| raspberry pi without a single code modification (besides
| expected stuff like audio frequency, file system, etc being
| different).
|
| The Rust crowds are just too new to have produced good
| emulators. Give us time :-)
| raphlinus wrote:
| I'm also noodling on an emulator, shooting for cycle accuracy
| (to the point where you'd be able to run vapor lock demos).
| Mine passes the 6502 functional tests but not yet the Tom
| Harte suite. To add to the difficulty, I want to run it on an
| RP2040 and bit-bang the video. Is yours up in an open source
| repo? Mine isn't (yet) but I'm open to joining forces.
|
| I started mine about 6 years ago, put it on the shelf, and am
| getting back to it. Overall I think Rust is working well for
| it (and for pico DVI out), but I can also see the advantages
| of Zig for something this low-level and specialized.
| wiz21c wrote:
| 133Mhz, that's 133 cycles per instruction for a 6502
| (assuming you emulate only the 6502). Seems doable but then
| there's the BCD stuff and that's rather tough emulate
| (you'll definitely need a lot of cycles to reproduce
| that)...
|
| But that sounds real cool. Would you share you project ?
| raphlinus wrote:
| It's a bit tricky, but my decimal adc is 19 lines of
| code, all of which is fairly straightforward, mostly bit
| fiddling stuff that I expect will compile to a few
| instructions each.
|
| I need to go through an open source releasing process but
| I'll get that going. Thanks for your interest!
| [deleted]
| panick21_ wrote:
| Are you serious, a single specific software you like doesnt
| exist in a specific language and you blame the language for
| that? You see how that makes no sense right?
| itishappy wrote:
| That's not the situation. The software does exist and doesn't
| work. "Is this a because of the language?" seems a reasonable
| question.
| pdimitar wrote:
| It's not a reasonable question at all. My first reaction
| was: "maybe the authors did it as a toy project". If nobody
| is paying them to do a professional and bit-perfect GameBoy
| emulator then there should be no expectations that the
| software is going to be good.
| itishappy wrote:
| That's an insightful reply, and an important point to
| consider. There's been a number of great answers in this
| thread, and the general consensus seems to be "the
| language is a great fit, but the projects haven't had
| time to mature."
|
| These seem (to me) like reasonable responses to a
| reasonable question.
| caskstrength wrote:
| > What gives? Is emulation just not a good use case for Rust?
|
| You fell for Rust PR (which is understandable, considering that
| both this site and Reddit are full of loud Rust
| fans/influencers). The language provides memory safety and some
| fancy ML-like syntax and that is it. It won't implement good
| emulator for you or make you a better SW developer.
| dogben wrote:
| The most accurate IBM PC emulator is in rust.
| https://github.com/dbalsom/martypc
| xmodem wrote:
| I think its a perfectly fine use case but we're not yet at the
| point where that community has reached a high enough level of
| rust adoption that there are a significant number of mature
| projects, even for something as relatively simple to emulate as
| a gameboy.
| insanitybit wrote:
| Emulation seems fine for Rust. I don't think there's a good
| answer to this anecdote. One option is that when I learned C++
| I built an emulator to learn how to write an emulator, whereas
| people writing emulators in Rust are perhaps often trying to
| learn Rust. Maybe that leads to a difference.
|
| It's basically impossible to say though, we'd need to do a full
| survey of the C++ and Rust emulator ecosystem and look at which
| bugs are most prevalent, control for authors experience and
| skill, etc...
|
| edit: I'll also note that when I built an emulator there were
| guides for it in C++. I don't know what materials exist in Rust
| but building an emulator is actually a project that I would
| consider C++ for (if for learning purposes) just because it's
| so well documented how to build one using the language.
| sdfghswe wrote:
| > What gives?
|
| My guess based on no evidence whatsoever: The Rust community
| seems to have lots of low effort enthusiasts - people whose
| primary motivation is "use Rust" rather than "solve a problem".
| People whose motivation is to solve a problem (in this case
| build an emulator so they can play a game or whatever) tend to
| be a lot more oriented towards the end-result, rather than what
| tool they're using. I'm pretty confident this isn't an issue
| with the language.
| mkl wrote:
| I think it's more "learn rust" than "use rust", and nowadays
| a lot of partly written learning projects are published on
| GitHub. A lot of the similar "learn C++" projects happened
| earlier and either stayed private or are quite good by now.
| jandrewrogers wrote:
| Most of the developers I know that are learning Rust do not
| come from a systems language background. They have to learn
| the domain as well as the language, which is a high bar.
| Learning to write systems-like code well takes a lot of time
| and experience.
| mathstuf wrote:
| Timings and audio handling sound like logic things and not
| something Rust's advantages over C++ would inherently support
| better.
|
| If I had to hazard a guess, it's that the C++ projects are just
| more mature and have seen more "in-the-wild combat" to get to
| the state they're in now. Other factors that may contribute (in
| no particular order): - existing emulators are
| Good Enough and polishing a new one is less "urgent" than other
| problems in the current zeitgeist; - hobby project vs.
| no-longer-a-hobby project status; and - emulator projects
| in C++ are of similar quality are just as prevalent, but are
| significantly older and instead of being discoverable on GitHub
| are abandoned on Sourceforge or personal websites.
| TonyTrapp wrote:
| That would be my guess as well. Emulation, interpretation of
| decades-old binary formats, and similar problems are hard to
| get right on the first try, they often need years of testing
| and breaking until they are right. A project that has been
| around for 20 years will always have an advantage in this
| space over a newcomer project.
| loup-vaillant wrote:
| Funny how being on GitHub makes you discoverable, while
| maintaining your own domain name does not...
|
| Actually no, that's not funny at all.
| elcritch wrote:
| You should add Nim to the list! There's a few implementations,
| though I'm not sure how complete they are. As others point out
| they're often used as ways to learn a language.
|
| Fun writeup: https://itwasscience.com/blog/gameboy Improving
| RosettaBoy performance: https://forum.nim-lang.org/t/9859
| InfamousRece wrote:
| I don't think Rust is inherently bad to write emulators. For
| example one of the best cycle-accurate IBM XT emulators is
| written in Rust: https://github.com/dbalsom/martypc
| simias wrote:
| Most people developing GameBoy emulators these days do it as a
| toy project, not a serious effort to create an ultra-accurate
| general purpose emulator. It's like writing a raytracer or a
| Pong clone.
|
| The best GameBoy emulators like Gambatte predate Rust by almost
| a decade and are often written in C++. Since GameBoy emulation
| is pretty much a solved problem there's no strong motivation to
| innovate in the scene.
|
| I've written several emulators in Rust, I'd argue that it's
| very well suited to this exercise, but it's also not an area
| where Rust shines particularly. There's no complex memory
| management in emulators usually since most memory buffers are
| just statically allocated constant-size arrays that mimic the
| physical RAM on the console.
| olafura wrote:
| So the question is why a project created in a language that has
| been around since before the devices being emulated are more
| mature than ones in a language that is 8 years old?
|
| Sorry to be so snarky. It probably depends on people's time
| working on it and their familiarity with async processes. This
| is an inherent problem with any media like that. With sound,
| it's primarily that it can't come before the image but can lag
| behind the image. Stuff like this.
|
| I don't think any programming language is going help with that
| because it's how we perceive things, not just how correct your
| code is.
| alex_lav wrote:
| > But given the sheer number of comments I've read over the
| years recommending Rust as a replacement for C++,
|
| > What gives?
|
| This is The Hype Wave, where something is new and exciting so
| it's recommended over the things of the past without regard to
| whether it's as stable/full featured/correct. People are
| excited about Rust so Rust has been jammed into just about
| everywhere to varying degrees of success/correctness. This is
| the same as Javascript/Node and Go over the last decade-ish.
| Zig and Mojo are probably next.
| bowsamic wrote:
| Which Rust emulators? Which C++ emulators?
|
| Perhaps the C++ emulators were serious, long-standing projects,
| and the Rust emulators were just hobby projects.
|
| Most hobbyists only care about getting a game on the screen.
| ForkMeOnTinder wrote:
| There's another strategy that has good storage efficiency and
| keeps the ability to iterate over elements. vec #1 is a list of
| tags, vec #2 keeps the byte offset of each element, and vec #3
| (not really a vec) is the packed variant data pointed to by vec
| #2. This:
|
| - is half the number of vecs of the author's final solution (6 vs
| 3)
|
| - wastes no bytes for padding (unless needed for alignment)
|
| - lets you iterate in a cache-friendly way since the data is laid
| out in order in memory regardless of type
|
| - even lets you access items by index in O(1)
|
| Generally it has the same performance characteristics as a
| Vec<T>, but for heterogeneous data.
| pornel wrote:
| But if you need to mutate such collection, you'll end up
| writing a memory allocator (to handle removals, changes to
| larger variant, and to deal with fragmentation).
| [deleted]
| dist1ll wrote:
| Storing byte offset inline, great idea. Thanks for mentioning
| that.
|
| Just note one disadvantage: if the byte offset is stored in
| memory, you have a data dependence on the traversal. So even
| though it's cache-friendly, it may cause serious memory stalls
| in the processor's pipeline.
| dan-robertson wrote:
| One trick for this is to leverage the branch predictor to
| guess where the next item will be (I first saw this trick for
| linked list traversal as in practice linked lists are often
| laid out linearly). Something like // or
| guess based on the size of the current item let
| guessed_next_offset = current_offset + (current_offset -
| previous_offset); let actual_next_offset =
| byte_offsets[++i]; previous_offset = current_offset;
| current_offset = guessed_next_offset; if(current_offset
| != actual_next_offset) // need to make sure the
| compiler doesn't 'optimise out' this if and always run the
| below line current_offset = actual_next_offset;
| // ...
|
| In the ordinary case, where the guessed offset is correct,
| the cpu predicts the branch is not taken, and no data
| dependency of ... on actual_next_offset is introduced. If the
| branch is mispredicted then that speculatively executed work
| with the wrong current_offset is dropped. This is a bit slow
| but in the case where this happens all the time, the branch
| predictor just gives you the slightly bad perf of the
| traversal with the data dependency (computing the guess will
| be negligible) except you pay if the guess was actually
| right.
| jadbox wrote:
| I'd love for someone to benchmark these variants to tease out
| their actual characteristics (particularly for x86).
| loeg wrote:
| Offsets can add a significant amount of space relative to small
| T, if their size is not optimized (e.g., 64-bit size_t and
| uint8_t T). So as long as we are careful about offset size this
| seems like a reasonable approach.
| dan-robertson wrote:
| Indeed part of this is pointed out in the video about Carbon
| linked in the OP: juxtaposition gives you a 'free' pointer
| going from each node of your data structure and for some
| tasks you don't really need more than that.
| bfrog wrote:
| I still have yet to understand if Zig actually provides memory
| safety in the same manner Rust does or not, it seems like its
| another C like language without memory safety? Or did I happen to
| miss that bit?
| dgb23 wrote:
| Zig provides tools to avoid memory related bugs, but it doesn't
| enforce it via compile time checked RAII/burrow checking etc.
|
| Nullability (and error handling) is compile time checked
| though.
| bfrog wrote:
| To me the biggest win in Rust is just that, the memory
| related bugs are nearly non-existent. Throw in a group of
| people working on something and this makes the project move
| just so much faster in both review time and time hunting
| bugs.
| cmrdporcupine wrote:
| IMHO the real big win w/ Rust is not so just memory safety
| from (some) leaks or segfaults but the fact that the borrow
| checker rules apply also to concurrency related scenarios
| and so race conditions and deadlocks become much more
| difficult to accidentally produce.
|
| Developers can generally be taught to handle memory
| ownership issues fairly well, esp when helped out with RAII
| smart pointers, etc.
|
| But in my experience when you throw concurrency in, even
| very high quality engineers tend to stumble and
| accidentally introduce race conditions.
|
| In this respect Rust has something over the garbage
| collected languages, too. Because Java, Go, JS, C#, etc
| _will_ effectively shield you from "use after free" but
| they will absolutely _not_ stop you from passing an
| unlocked, etc. reference to another thread and screwing
| yourself up royally; and Rust will.
| [deleted]
| throwawaymaths wrote:
| Zig gives buffer overflow safety and null pointer safety but
| not use after free, double free, or stack pointer escape
| safety.
|
| Zig also gives you memory exhaustion safety which rust does
| not.
| pitaj wrote:
| What do you mean by memory exhaustion safety?
| oconnor663 wrote:
| Is there a memory exhaustion scenario where Rust does
| something other than panic?
| pitaj wrote:
| You can use the fallible allocation APIs such as
| `Vec::try_reserve`
| kibwen wrote:
| Rust-the-language doesn't do any dynamic allocation, so it
| doesn't exhaust memory at all. The Rust stdlib currently
| guarantees an abort on OOM, but in the future this will be
| changed to a panic (which you can always configure to be an
| abort if you want). Both of these behaviors are memory-safe
| (it's unclear what "memory exhaustion safety" is referring
| to).
| throwawaymaths wrote:
| Also to clarify: there are platforms (like linux) which
| de facto have nonfailable allocation, if you run out of
| memory the system will oom kill your process, and neither
| zig nor rust will save you from that.
|
| I believe In practice the most common place where
| failable OOM is a big deal is in embedded systems
| programming
| 10000truths wrote:
| No language can save you from OOMs, but because Zig
| pervasively uses explicit memory allocation, it makes it
| easy to greatly mitigate OOM risks by front-loading all
| the fallibility:
|
| 1. Calculate and allocate all your required memory
| immediately upon process startup (including memory from
| OS resources like sockets)
|
| 2. Call mlockall to prevent the pages from being swapped
| out
|
| 3. Write "-1000" to /proc/self/oom_score_adj to avoid the
| OOM killer
|
| 4. Use your preallocated memory as a pool for all further
| allocations
|
| With the above approach, the application has full control
| over how to handle application-level OOMs (e.g. applying
| backpressure to connecting clients, or shrinking non-
| critical caches) once it is past the start-up stage.
| throwawaymaths wrote:
| > memory exhaustion safety
|
| Sure this more like DDOS mitigation, so it memory related
| safety, not memory safety.
| gpanders wrote:
| >Zig gives buffer overflow safety and null pointer safety but
| not use after free, double free
|
| It _can_ provide the latter two through the use of the
| `GeneralPurposeAllocator`, which tracks UAF and double free.
|
| Stack pointer escape safety is being actively researched
| (there are a few tracking issues, see [1]). I'm personally
| interested in this, I've written a decent-ish amount of Zig
| for toy/personal projects, and anecdotally stack pointer
| escape is the only type of memory error that has bitten me
| more than once (though to be fair, one of these cases was
| calling into a C API, so even Rust wouldn't have helped).
|
| More broadly, the ability to catch all forms of UB in
| Debug/safety builds is an accepted proposal [2], though
| whether or not it will be possible in practice is a different
| (and interesting!) question.
|
| [1]: https://github.com/ziglang/zig/issues/2646
|
| [2]: https://github.com/ziglang/zig/issues/2301
| SkiFire13 wrote:
| The way `GeneralPurposeAllocator` works is kinda scary
| though, since it may result in whole memory pages used only
| for very small allocations, effectively multiplying the
| memory usage of the program. Also kinda goes against the
| memory exhaustion safety, since now you're more likely to
| exhaust memory.
| saagarjha wrote:
| An allocator that does heap quarantining at the page level
| is not a "general purpose allocator". It is a debug tool.
| jmull wrote:
| Yeah, I don't think it's right to say Zig doesn't have use-
| after-free and double-free safety. If you want that, you
| can just use a general purpose allocator. Note that you
| can't forget to choose an allocator since they are
| explicit.
|
| Is this somehow harder than, say, choosing not to use
| "unsafe" in Rust?
|
| Maybe all that is missing is a linter to help enforce
| whatever memory-management policy you've decided on. That's
| not really needed for small, coherent teams, but would be
| important for using Zig in larger projects with multiple
| teams and/or disparate individual contributors. (Perhaps
| such a thing exists and I just don't know about it.)
|
| You might also be able to use an arena allocator where free
| is a no-op. That has different tradeoffs, but is also safe
| for use-after-free and double-free.
|
| As you say, stack escape is the main thing where Zig
| doesn't have a good memory-safety story yet (at least not
| that I've heard). I guess there are a few others that
| concern me when I see them on a list, though I haven't hit
| them in real life.
| saagarjha wrote:
| C has use-after-free and double-free safety if you patch
| out free. Not really a solution, is it?
| jmull wrote:
| What do you think the difference is between "patching out
| free" and allocators as a first-class feature? I'll bet
| you can think of a few rather significant ones if you
| try.
| throwawaymaths wrote:
| Static analysis at the IR level would be awesome. It
| could catch use-undefined, stack escape, and probably
| uaf/df as well so you don't have to lean on gpa's
| (potentially expensive) tricks. Plus there are times when
| you maybe don't want to use page allocator.
|
| As an aside. I'm not certain I understand how double free
| is memory unsafe (in the sense of "causing
| vulnerabilities")
| saagarjha wrote:
| A double free breaks invariants for the memory allocator.
| For example, the freed memory may have been handed out to
| someone else and if you free it again, it will be marked
| as unused even though that code relies on their stuff
| being available. One very classic way of exploiting a
| double-free is a sequence like this happens:
|
| 1. Some code allocates memory.
|
| 2. The code frees the memory, but keeps a stale reference
| to it around. It is marked as unused by the allocator.
|
| 3. Some other code allocates memory. Maybe it's reading
| the password file off of disk. The allocator has some
| unused memory lying around so it hands it out-but it
| turns out that this is actually just a reuse of the
| buffer from earlier. It is now marked as "in use" again
| by the allocator.
|
| 4. The code from earlier has a bug and frees the
| allocation again. This means that the allocation is now
| marked as "unused".
|
| 5. Another allocation request hands out this memory
| again. Maybe it's a buffer for user input? Well, it's
| been scribbled all over with other data now.
|
| 6. Someone asks to authenticate and the password checking
| code gets called. It has the password right here to check
| against...oh, wait, that memory got freed out from under
| it and overwritten with some attacker-controlled content!
| jmull wrote:
| > I'm not certain I understand how double free is memory
| unsafe (in the sense of "causing vulnerabilities")
|
| Perhaps there are some allocators where doing that hits
| UB. UB in memory allocation is probably always a memory
| safety issue. I would say if your code accepts any
| allocators where double-free could be UB then you've got
| a safety issue.
| xigoi wrote:
| > I still have yet to understand if Zig actually provides
| memory safety in the same manner Rust does or not
|
| It doesn't.
| bfrog wrote:
| So its C with nicer macros and maybe some more sensible ways
| of dealing with arrays and such?
|
| For me that's not enough to move the needle.
| cmrdporcupine wrote:
| It's more than that in that it has a sane syntax and module
| and build system.
|
| I'm a Rust person by both day job and hobby, but I can see
| the niche Zig is in as being quite useful in particular for
| embedded work.
|
| One thing it definitely has over Rust is way better support
| for explicit per-structure allocator/allocation management.
| The allocator_api stuff in Rust has just been sitting
| unstable for years at this point, while Zig shipped with
| support for this immediately. Which is key for work in all
| sorts of systems level contexts.
|
| Probably the language I _want_ is somewhere between the
| two.
| [deleted]
| bfrog wrote:
| I mean I really have a hard time going backwards on the
| whole idea of lifetimes and send/sync type traits which
| force escape hatches if you want to do something idiotic.
| The key piece here is, any project of significant size
| and usage, someone will try to do something stupid. Rust
| makes the stupid so obvious with unsafe blocks and
| annotations, that reviewing code from some random person
| on the internet is made easier. The tooling made the task
| of reviewing code mostly limited to the code being
| changed rather than the whole project.
|
| In C/C++ land you need to consider not only the code
| being changed but also the entire code base and how that
| code is used. It's not enough to look at the diff. You
| need to understand the context.
|
| Not to say that isn't also true with Rust to some degree,
| but the degree is usually in terms of logic and less "is
| some pointer/reference going to cause a
| race/segfault/buffer xflow?"
|
| The langauge itself doesn't define allocation, Box is in
| the stdlib and this allows for nostd libraries to deal
| with things as I guess most of us might expect I'd think.
| It would be cool to allow for a global allocator though
| to enable std on more platforms with better semantics, no
| disagreement there.
| emmanueloga_ wrote:
| It doesn't, but it puts a RED BANNER in every place where
| memory is allocated, and you need to explicitly pass an
| allocator of your choosing whenever memory is allocated. It
| also contains no hidden allocations or function calls. So it
| forces you to think about your memory allocation patterns,
| and also includes some tools to catch memory leaks [1].
|
| 1: https://ziglang.org/learn/samples/#memory-leak-detection
| saagarjha wrote:
| That doesn't really help.
| [deleted]
| maxmcd wrote:
| I found this article to be clarifying: https://www.scattered-
| thoughts.net/writing/how-safe-is-zig/
| bfrog wrote:
| In effect this answers my question pretty well, the answer
| seems to be a resounding "No there's no memory safety" at
| least in the practical or realistic sense that results in few
| to no memory lifetime errors.
| Andrew018 wrote:
| [dead]
| pornel wrote:
| Rust's type system ended up being Turing-complete anyway. It
| seems like it fell into the same trap as C++ templates: it's an
| accidental programming language in a programming language, but
| with way worse syntax.
|
| The lesson for language designers may be that every useful type
| system is doomed be Turing complete, so you may embrace it from
| the start, instead of trying to make it declarative.
| itishappy wrote:
| Fascinating! Now I'm curious, what causes this? Where does
| Turing completeness come from?
|
| Polymorphism? Inference? Higher-kinded types? (Probably not
| that last one, I don't think Rust has them.)
|
| Put another way, what would it take a for a type system to NOT
| become Turing complete?
| AprilArcus wrote:
| All you need is a way to branch and a way to loop, so
| anything with conditional types and recursive types will be
| Turing Complete
| ynik wrote:
| Turing completeness doesn't have to come from a single
| feature, it often comes from a combination of features.
|
| e.g. the interaction between subtyping (e.g. inheritance) and
| generics (with variance) is tricky:
| https://www.cis.upenn.edu/~bcpierce/papers/variance.pdf
|
| It can be highly nontrivial to tell if a language actually
| has a Turing complete type system: the 2007 Kennedy&Pierce
| paper made it likely that java was turing complete; but it
| took until 2016 until it was finally proven that to be turing
| complete (https://arxiv.org/abs/1605.05274).
|
| > What would it take a for a type system to NOT become Turing
| complete?
|
| An analysis of all possible interactions between all features
| in the type system, building a formal proof that the type
| system is not turing complete. This is not really realistic
| for the style of complex generic type systems that
| programmers are now used to, it would need to be a vastly
| simpler language.
| avgcorrection wrote:
| Surely they knew that it would become a programming language in
| its own right. That's the fate of all such statically typed
| languages which give you type-level power but that aren't
| dependently typed. (Of course dependently typed languages might
| end up with other languages like its own _tactics_ language.)
| insanitybit wrote:
| I don't think anyone has ever been surprised to find that a
| type system (with inference) is turing complete.
| baq wrote:
| it's easier to make something Turing-complete than not... most
| interesting example is probably rule 110.
| wredue wrote:
| The point is that it'll end up that way, so just embrace it
| from the start.
| baq wrote:
| the lisp way. the problem is it results in software which
| is difficult to reason about due to all the compile-time
| execution, even if you have state-of-the-art compile-time
| debuggers (if any language has them, it is lisp).
| wredue wrote:
| I haven't so far found any zig code terribly hard to
| reason about, and it's important to know that compile
| time type crafting (currently) doesn't support decls.
| Although you can madness your way around that, theres
| idiomatic ways to manage this.
|
| Comptime is definitely very powerful, and even the top
| people using zig frown upon "magic".
|
| But to be completely honest, "magic" shit just happens
| all the time if you enable it. Rust macro abuse to get
| "magic" comes to mind. If it happens in rust, it'll
| certainly happen in zig.
| Ar-Curunir wrote:
| That something can be Turing-complete in the worst case doesn't
| mean that you hit those worst-cases on a frequent, or even
| occasional basis.
| marcosdumay wrote:
| > instead of trying to make it declarative
|
| Wait, no. A language being Turing complete and declarative are
| completely independent things.
| dgb23 wrote:
| That's the point!
| hnfong wrote:
| > Rust's type system ended up being Turing-complete anyway. It
| seems like it fell into the same trap as C++ templates
|
| I hear it is a common trap.
|
| https://aphyr.com/posts/342-typing-the-technical-interview
| dist1ll wrote:
| author here. That's a really good observation kornel. Rust's
| philosophy of avoiding post-monomorphization errors at all
| costs is usually sung in high praises (followed by a ridicule
| of C++ template errors) - but it comes at a cost. And when you
| stray away from the declarative way and start plastering const
| generic bounds everywhere you really feel the pain.
|
| Anyways, I think engaging in this topic as a community is super
| important. Cause that's the only way to push PLs forward and
| explore this massive space.
| packetlost wrote:
| This is sort of the approach that I've considered. Type systems
| are just predefined code/behavior baked into a language, but
| there's _technically_ nothing stopping you from arbitrarily
| extending the compiler like Lisps do with macros to support a
| type system that 's written in the language itself.
| naasking wrote:
| Type Systems as Macros:
|
| https://www.khoury.northeastern.edu/home/stchang/pubs/ckg-
| po...
| mlochbaum wrote:
| I did this! Singeli is an Elixir-like compile-time language
| where types and variables are first-class values, on top of
| C-ish semantics that are meant to be more like "portable
| assembly". It's designed for high-performance code where you
| have to be able to control the specific instructions, but the
| overall strategy could be useful in other domains too.
|
| https://github.com/mlochbaum/Singeli
|
| And a podcast on it came out Friday:
|
| https://www.arraycast.com/episodes/episode62-what-is-singeli
| mk12 wrote:
| I listened to that, it was good! I was surprised there is an
| entire podcast about array programming. Looking forward to
| future episodes, and also to trying AoC with BQN again in a
| few months.
| [deleted]
| Aardwolf wrote:
| Genuine question:
|
| How do type system that are so abstract and complex to become
| turing complete help?
|
| I definitely find it extremely helpful to be able to write
| containers of any type (like std::vector<T>), that saves a ton
| of code duplication, but beyond that what more is needed and
| why? What we're competing with here is: just write a function
| that operates on the data types you want and gets the job done.
|
| You're writing code for the CPU that has to actually do
| something, on actual known types.
|
| What programming task is simplified by having an "any" type or
| a type system that allows you to write pong-played-turn-by-
| turn-using-compiler-error-messages? It's cool that you can have
| an "any" type just like universal sets in set theory, but what
| real-life programming scenario does this simplify (you can
| already write containers that can contain anything you want
| without using such as thing as an "any" type)?
|
| At least not the kind of programming tasks I do, but admittely
| I think fairly low level and prefer my types to have exact
| known amounts of bits, known signed integer convention and
| endianness so I can efficiently use shifts and get the bits I
| need, preferably with as little undefined behavior as possible.
|
| Asked differently: If one were to design a programming language
| that only has basic types (primitives, structs/classes, ...)
| and templates to allow functions/classes to operate on any type
| (but not more than that; substitute template type with the
| actual type, compile this, nothing more), what feature will
| users of the language be missing and complain about?
| munificent wrote:
| _> Asked differently: If one were to design a programming
| language that only has basic types (primitives, structs
| /classes, ...) and templates to allow functions/classes to
| operate on any type (but not more than that; substitute
| template type with the actual type, compile this, nothing
| more), what feature will users of the language be missing and
| complain about?_
|
| This is a really good question. There are some languages that
| work as you describe: SML and some others in that family.
| There are generic functions and types, but the type
| parameters are basically just placeholders. You can't _do_
| anything with a value whose type is a type parameter, except
| store it and pass it to things that also take type
| parameters.
|
| That gives you enough to write a nice reusable vector type.
| But it doesn't let you easily write a nice reusable hash
| table. You can, but users have to pass in an explicit hash
| function that accepts the key type every time they create a
| hash table.
|
| It might be nice if a type itself could indicate whether it's
| hashable and, if so, what it's hash function is. Then, if you
| create a hash table with that key type, it automatically uses
| the hash function defined by that type.
|
| Now you need some sort of constraints or type bounds in your
| generics. That's what traits in Rust and bounds in Java and
| C# give you. (The literature calls it "bounded
| quantification".) It's a _big_ jump in complexity. But it
| does mean that now you can call functions /methods on
| arguments/receivers whose type is a type parameter, and those
| calls can be type checked.
|
| Bounds are themselves types, so what kinds of types can you
| use in bounds? Can they be generic? If so, what kinds of type
| arguments are allowed? Can you use type parameters from the
| surrounding type?
|
| For example, which of these are OK and which aren't (using
| Java-ish syntax): class A<T extends Foo> {}
| // 1. class A<T extends Bar<Foo>> {} // 2.
| class B<T extends B<Foo>> {} // 3. class B<T
| extends Bar<T>> {} // 4. class B<T extends B<T>>
| {} // 5.
|
| Any kind of bounded quantification will give you 1-3. What
| about 4 and 5? This is called "F-bounded quantification". Why
| would you want such a thing?
|
| Collections with fast look-up are important, which is why we
| extended our generics to enable us to write nice reusable
| hash tables. But some data types aren't easily hashed but can
| be easily ordered. A sorted collection is faster than an
| unsorted one.
|
| How would we write a generic sorted collection? We could
| require you to always explicitly pass in an ordering function
| for any given element type, but it would be nice if the
| element type itself could supply is order function.
|
| You could define a "Comparable" interface that a type can
| implement to support comparing an instance against another
| object of some type, like: interface
| Comparable<T> { int compareTo(T other); }
|
| And then implement it on your type, like:
| class Color implements Comparable<Color> { int r,
| g, b; int compareTo(Color other) => ...
| }
|
| In our sorted collection, elements all have the same type, so
| the bound that we need looks like: class
| SortedCollection<T extends Comparable<T>> { ... }
|
| Notice that we have "T" inside the bound. That's F-bounded
| quantification.
|
| Using type parameters inside a bound isn't the only place
| recursive types like this show up. Let's say you wanted to
| make a _generic_ type comparable. You 'd do something like:
| class Pair<T> implements Comparable<Pair<T>> { T a,
| b; int compareTo(Pair<T> other) => ...
| }
|
| Now here, the implements clause is using not just the type
| parameter of the enclosing type, but the entire type.
|
| We had a couple of fairly modest goals:
|
| * Be able to create reusable hash tables where the hash
| function is inferred from the key type.
|
| * Be able to create reusable sorted collections where the
| comparison function is inferred from the element type.
|
| And in order to get there, we needed generics, bounds, and
| even F-bounded quantification.
|
| Adding even a little more usefulness to our collection types
| will quickly have us reaching for variance annotations,
| associated types, and even more exotic stuff.
| widdershins wrote:
| It's about when you want to enable optimisations or
| conveniences depending on the parameterized type.
|
| For example, in the std::vector<T> type, if T supports being
| moved, you want to use that when growing your vector for
| performance. If T doesn't support being moved, you will have
| to copy it instead.
|
| Boom: you've ended up with template metaprogramming.
| gpderetta wrote:
| > You're writing code for the CPU that has to actually do
| something, on actual known types.
|
| (Partial) specialization, which is a feature used to get
| templates to do actual something on actual types is what
| principally allows templates to be turing complete.
| lenkite wrote:
| Basically design in dependent types into the language right
| from the beginning (like Iris) - such that types are first
| class and can be computed and manipulated like any other
| language construct.
| bhouston wrote:
| > Rust's type system ended up being Turing-complete anyway. It
| seems like it fell into the same trap as C++ templates: it's an
| accidental programming language in a programming language, but
| with way worse syntax.
|
| Most anything in software can easily end up Turing-complete
| though. It isn't that high of a bar unfortunately. I think most
| mature type and template systems end up as Turing-complete. To
| single out Rust for this is to ignore that this is just
| standard fare for mature typing systems.
|
| TypeScript's types are Turing complete:
| https://github.com/microsoft/TypeScript/issues/14833
|
| Python's type hints are Turing complete:
| https://arxiv.org/abs/2208.14755
|
| Java generics are Turing complete:
| https://arxiv.org/abs/1605.05274
| egl2021 wrote:
| "It isn't that high of a bar unfortunately." --> "It isn't
| that high of a bar."
| 38 wrote:
| > Most anything in software can easily end up Turing-complete
| though.
|
| not really, if you actually pay attention when youre
| designing the type system:
|
| https://arxiv.org/pdf/2005.11710.pdf
| bhouston wrote:
| My experience with Turing-completeness in any custom DSL or
| similar is that you will generally get Turing-complete
| systems unless you work really really hard to avoid them.
| And it only takes one wrong move by anything anywhere in
| the system, and you'll end up with Turing-completeness.
|
| Instead of trying to avoid Turing-completeness, just expect
| it, embrace it and instead deal with its consequences to
| contain the fallout.
| 38 wrote:
| > Instead of trying to avoid Turing-completeness, just
| expect it, embrace it and instead deal with its
| consequences to contain the fallout.
|
| this seems like such a cop out. its like saying "oh
| failure is a given, so don't even try to succeed". at
| least currently, Go generics are NOT Turing complete. the
| generics were designed in part specifically to avoid
| that. so just because Rust (and others) failed, doesn't
| mean its impossible.
| insanitybit wrote:
| I don't think most people care at all that their type
| system is turing complete. It basically never comes up.
| taeric wrote:
| Agreed. Adding to this point, why should I care? Am I
| missing anything obvious on why you don't want it?
| insanitybit wrote:
| Not really. In theory it means your compiler might never
| terminate... that does not usually happen.
| bhouston wrote:
| > this seems like such a cop out. its like saying "oh
| failure is a given, so don't even try to succeed".
|
| It is a pragmatic cop-out yes. I found it is easier to
| make progress by assuming that you'll get to Turing
| Complete rather than investing in the time to avoid it. I
| found that the downsides of Turing Completeness are
| usually overhyped or primarily theoretical.
|
| > so just because Rust (and others) failed, doesn't mean
| its impossible.
|
| It definitely is not impossible. But I don't know if it
| is worth it.
|
| In the end I want fast compile times, type safety, easy
| to maintain code and good error messages. I do not care
| if the type system is Turing Complete.
| 38 wrote:
| > In the end I want fast compile times, type safety, easy
| to maintain code and good error messages. I do not care
| if the type system is Turing Complete.
|
| I think the problem is some languages with Rust go too
| far with generics, which probably triggers the turing
| complete. for example, this is valid Rust code:
| let mut handles = Vec::new(); for i in 0..10 {
| let handle = do_work(i);
| handles.push(handle); }
|
| but you have to follow the code all the way to "do_work"
| before you ever find the type of anything. Go does not
| allow this. you need to either declare a concrete type:
| var handles []int
|
| or a explicit generic type: type
| slice[T any] []T var handles slice[int]
|
| I think Rust is lose lose, because the underlying type
| implementation is Turing complete, and I would argue the
| code is actually less readable because of the overuse of
| type inference.
| bhouston wrote:
| > I think Rust is lose lose, because the underlying type
| implementation is Turing complete
|
| I am not a Rust coder so I can't really comment.
|
| Currently, I do TypeScript, which also has a Turing
| complete type system, and I love it. Of course all things
| in moderation. Even though I could make a completely
| obtuse type design for projects, I try to not write code
| that others can not understand.
| insanitybit wrote:
| "if you actually pay attention" or, in other words, if you
| produce novel research into the area that's worthy of
| publishing, and noting that this is only just the start and
| has its own limitations:
|
| > The cost is that of requiring a whole program analysis
| and disallowing programs that would result in infinite
| instantiations (Section 5.3). Clearly, this is the
| beginning of the story, not the end.
| [deleted]
| dgb23 wrote:
| The point is that Zig basically circumvents issues
| surrounding that, by introducing "comptime", where you can
| just write regular Zig to achieve the same things.
|
| The article showcases a nice example of having this very
| direct power.
|
| But it really comes up more often than you think as soon as
| you actually have it.
|
| It's easy in Zig to allocate precisely based on computed
| values and then have the sizes as part of your types etc. It
| all falls out of some simple ideas and it's all just regular
| Zig code.
|
| "Types in Zig are values of the type type" from:
| https://ziglearn.org/chapter-1/
|
| So instead of making it hard to write incorrect programs, Zig
| makes it easy to write correct programs.
| littlestymaar wrote:
| The problem is that it ends up being dynamically-typed,
| like C++ templates. See:
| https://nitter.net/pcwalton/status/1369114008045772804
|
| > So instead of making it hard to write incorrect programs,
| Zig makes it easy to write correct programs.
|
| Well maybe in theory, but the current state of Zig is that
| it makes it hard to write programs no matter how correct
| because the compiler keeps crashing -\\_(tsu)_/-
| wredue wrote:
| What are you talking about? The tagged versions are
| usually quite stable and also plan bug fix follow ups.
|
| If you follow master, you'll occasionally run in to
| crashes, which is true of any developing language. If you
| don't want that, follow tagged versions.
| ksec wrote:
| >instead of making it hard to write incorrect programs, Zig
| makes it easy to write correct programs.
|
| Or May be rephrasing it ( To avoid the word "instead" which
| may anger Rust supporters );
|
| _Rust Makes it hard to write incorrect programs, Zig makes
| it easy to write correct programs._
|
| I think this single sentence captures the philosophical
| difference between Rust and Zig. And of course there is no
| right or wrong in philosophy.
| dgb23 wrote:
| I fully agree. It's an interesting trade off that is
| worth thinking about.
| bhouston wrote:
| Here is a really good link that shows the power of Zig's
| comptime:
|
| https://kristoff.it/blog/what-is-zig-comptime/
| chrismorgan wrote:
| I am certainly sometimes envious of comptime and what it
| makes practical, but it's worth noting that it results in
| dynamically-typed generics, whereas Rust goes for
| statically-typed generics, which is in keeping with its
| goals. There are some significant general maintainability
| improvements in statically-typed generics; when you use
| dynamic generics, subtle changes in one place can cause
| obscure compile errors in a completely different and
| seemingly unrelated place (commonly called _post-
| monomorphisation errors_ ); this doesn't happen with static
| generics.
|
| So... I'm not sold on your wording that it's
| _circumventing_ issues, as it's choosing a different set of
| trade-offs. In shedding types-are-a-language-of-their-own,
| you also shed confidence about what's a breaking change,
| and make call sites more fragile. Decide for yourself
| whether it's worth it.
| anonymoushn wrote:
| What do you mean by dynamically-typed generics and
| statically typed generics here? I've looked up "post-
| monomorphization errors" and found some things about
| assertions about generic types failing because of the
| choices made by the user who passed in types or constants
| that do not work with the generic code. It seems like Zig
| libraries have the option of generating the errors at the
| right place if they place their assertions in the
| function that returns the type to the user, but they also
| have the option of generating the errors in the wrong
| place if they place their assertions in methods of the
| type.
|
| > So... I'm not sold on your wording that it's
| circumventing issues, as it's choosing a different set of
| trade-offs. In shedding types-are-a-language-of-their-
| own, you also shed confidence about what's a breaking
| change, and make call sites more fragile. Decide for
| yourself whether it's worth it.
|
| Client code can just look at all the members of all the
| structs so there's not really much hope for enforcing
| that changes cannot break any client code using compiler-
| adjacent tooling.
| dgb23 wrote:
| I'm still in the honeymoon phase with this language and
| learning, but I agree it's a trade off.
|
| For example your LSP isn't going to help you as much
| while you edit code.
|
| However being able to express arbitrary compile time
| constraints and pre-compute stuff without having to go
| through a code generation tool is really powerful. You
| can actually use all the knowledge you have ahead of time
| as long as you can express it in Zig.
|
| So far it seems like Zig is carving out a very strong
| niche for itself.
| convolvatron wrote:
| is there anything fundamental about using the same
| language at compile time to generate fully static typing
| that is boiled away? I don't know, but it doesn't seem
| so?
| jjtheblunt wrote:
| Can you show how you'd implement a Turing machine in Rust
| types?
|
| I am not seeing it, so am likely overlooking that.
| justinpombrio wrote:
| Agreed. The strong agree is that Rust's type system---and many
| other languages' type systems---is a parallel language to its
| runtime language. Parameterized types are like functions;
| instantiating a type parameter is like a function call; `impl
| ... where` does pattern matching. You get a pure functional
| language based on pattern matching and implemented with
| memoization. These type systems are often Turing complete _on
| purpose_ , but yet it's surprisingly hard to get an infinite
| loop. Like, both Rust's and Java's type systems are Turing
| complete, yet I've never hit an infinite compilation loop in
| either by accident.
|
| For more on type systems as programming languages:
| https://ductile.systems/oxidizing-the-technical-interview/ plus
| its links at the top.
|
| What I would like to see is a programming language where the
| runtime language and the comptime language are the same, or
| nearly the same, _and where the comptime language is type
| safe_. Zig isn 't this: its `comptime` language is essentially
| dynamically typed. In Zig, if a function `f` takes a `comptime`
| argument `t: Type`, it can call `.print()` on it, and the
| compiler will just assume that that's fine. If you call `f`,
| you had better read the docs because the _type system_ won 't
| tell you that `t` needs to have an `print()` method. If you're
| calling `f` directly, that's pretty straightforward, but the
| trouble comes when you actually call `h`, and `h` looks at the
| value of some string and based on that decides to call `g`, and
| `g` calls `f`, and the docs for `h` weren't entirely clear, so
| now you're seeing an error in `h`, which you didn't even call.
| Instead, if `f` is going to call `.print()` on a `t`, then its
| argument `t` can't just be a `Type`, the compiler should check
| that it's a `Type with method .print()->String`. This
| requirement would then flow to `g` and `h`, so the type
| signature for `h` is guaranteed to tell you that `t` is
| required to have an `print()` method.
|
| For more on merging runtime and comptime languages in a type
| safe way, see 1ML: https://people.mpi-sws.org/~rossberg/1ml/
|
| EDIT: Deleting my criticism of C++ templates lest it distract
| from the more substantial things I had to say above.
| gpderetta wrote:
| > What I would like to see is a programming language where
| the runtime language and the comptime language are the same,
| or nearly the same, and where the comptime language is type
| safe
|
| I have aproximately 0 knowledge of it, but I think
| TemplateHaskell should do that.
| justinpombrio wrote:
| Nope. Haskell's type system guarantees that code you
| construct in Template Haskell is well-formed, but the code
| you construct is only type checked once, at the end. So if
| you have a function that constructs code, and the code it
| constructs has a type error in it, you won't find out
| unless you call the function. Just like Zig.
| avgcorrection wrote:
| > What I would like to see is a programming language where
| the runtime language and the comptime language are the same,
| or nearly the same, and where the comptime language is type
| safe.
|
| MetaOCaml might the closest one.
| zozbot234 wrote:
| > and where the comptime language is type safe
|
| You need dependent types in order to do this, which means
| doing away with Turing-completeness. (Moreover, the principle
| 'Type is of type Type' as found in Zig comptime leads to
| type-unsafety. So you need to replace that with some notion
| of universes.)
| justinpombrio wrote:
| Well, you can have dependent types but limit their
| expressiveness. Rust, for example, has dependent (comptime)
| types, but they're very extremely limited:
| fn foo<const N: usize>() -> [f32; N]
|
| I imagine having arbitrary comptime code, but more limited
| use of comptime values in type position.
|
| Also, do you know the exact issue with "Type is of type
| Type"? I know that can lead to _non-termination_, and non-
| termination completely breaks proof assistants. For
| example, you can prove `False` with: fn
| make_false() -> False { return make_false(); }
|
| But if you're not building a proof assistant, a function
| like `make_false()` is fine. Does it lead to any additional
| problems?
| throwaway894345 wrote:
| I think there's still value in pushing people towards something
| declarative. For example, a static type system might have an
| `any` type for things that aren't expressible statically but
| that doesn't mean type system designers should embrace dynamic
| typing from the start instead of trying to make it static.
| Similarly, it makes sense to have panics/exceptions in a
| language as an escape hatch for errors where the application
| can't reasonably do anything gracefully, but it's (arguably) a
| bad experience to take that to the extreme and panic for _all_
| errors.
|
| I think there's value in _accounting for the possibility_ that
| there will be edge cases that preclude hard-and-fast rules like
| "purely static" or "purely declarative", but I dislike the
| philosophy of projecting the 1% use case (e.g., "dynamic" or
| "turing complete") onto the 99% use case (where e.g., static
| and declarative would be ideal). I like when languages design
| for the 99% case and allow for escape hatches for the remaining
| 1% with the understanding that these escape hatches are
| intended to be used judiciously.
|
| To put it differently, embrace that there may be escape hatches
| in the initial design, but prefer to think of them as "escape
| hatches" with the entailed understanding that they should be
| rarely used.
| haberman wrote:
| 100 times this. C++ is progressively making more and more of
| the language and stdlib available at compile time (constexpr,
| consteval, constinit). Once this is accomplished, there will be
| _two_ very complicated Turing-complete programming languages
| available at compile time, one that is convenient to program
| (C++) and one that is not (C++ templates).
|
| Zig jumps straight to the finish line by making the main
| language available at compile time out of the gate, and by
| using it as the "generics language", Zig's generics are both
| simpler and more powerful.
| fweimer wrote:
| How does this AoVA data structure work in practice? Don't you
| lose index-based access, in the sense that arithmetic on indices
| is potentially no longer meaningful in the array sense? Iteration
| will not preserve insertion order.
|
| I think TLV (tag-length-value; length can be implied by the tag)
| is more commonly used in this context due to its better caching
| properties, and it least it provides meaningful forward
| iteration. See getdents/inotify/Netlink messaging.
| LegionMammal978 wrote:
| > How does this AoVA data structure work in practice? Don't you
| lose index-based access, in the sense that arithmetic on
| indices is potentially no longer meaningful in the array sense?
| Iteration will not preserve insertion order.
|
| The caption to figure 4 suggests that the AoVA pattern is not a
| good fit if you need to retain the full order of inserted
| elements:
|
| > Compared to the SoA layout from before, we have a partial
| order instead of a total order. So upon insertion, we get back
| a tagged index that holds both the enum tag and the index in
| the particular variant array.
|
| So by my reading, ordered access is considered out of scope
| here. (I suppose you could recover ordered iteration by storing
| a global index in each element, but that still wouldn't help
| with ordered random access, and it would likely lead to some
| really branchy code.)
| catgary wrote:
| Yeah I imagine an array write that changes the type of an index
| would be insanely expensive...
| Aransentin wrote:
| In practice, changing tag type is rarely done.
|
| In any case it wouldn't be particularly expensive; you'd have
| to make a new member of the destination type which is as
| expensive as an append. The "hole" that remains in the
| original vector can be filled in constant time by taking the
| last member of it and moving it there, then shrinking its
| size by 1.
| MobiusHorizons wrote:
| I'm sure there are classes of problem where that's not
| unusual. Even in the stated example of an AST, there are
| use-cases for code-fixers (eg lint fixers) that operate on
| the AST.
| dan-robertson wrote:
| Don't those things usually work on concrete syntax trees?
| And shouldn't the solution be to use the right data
| structure for the job (using language feature to make
| this easy) rather than pushing one system (e.g. the
| compiler) to be worse for the sake of another tool (the
| linter)?
| dylukes wrote:
| AoVA lacking its own total ordering on indices might be a
| problem for some use cases, but not for the one that is
| suggested here: AST nodes.
|
| In these cases, the arrays can be just one component (could
| think of it as an arena) of a heap-ish structure. [1]
|
| The cost is that your indices now need to be two dimensional
| (tag_idx, va_for_tag_idx). But the number of tags is known at
| compile time and you can optimize storage by packing so that
| tag_idx is the upper 4-5 bits and va_for_tag_idx uses the rest.
|
| See: [1]
| https://www.cs.cornell.edu/~asampson/blog/flattening.html
| skybrian wrote:
| > upon insertion, we get back a tagged index that holds both
| the enum tag and the index in the particular variant array.
|
| These are essentially pointers. If you want to iterate, you
| store the pointers in an array in the order you want to use.
| It's the same thing a program would do if it allocated memory
| from a heap.
|
| Storing things based on their size is also done by garbage
| collectors and general-purpose allocators. They might get some
| efficiency from knowing all possible object sizes, though.
| Also, like an arena, they could gain some efficiency from
| having a simpler way of deallocating.
| wyldfire wrote:
| > In it he describes how a parsed clang AST regularly consumes
| 50x more memory than the original source code!
|
| I guess that seems like a lot but the context I'm missing is -
| how good could it get? I mean, if you want to preserve source
| locations of each token and you need to encode enough information
| in the AST to properly recover, what's the ideal growth factor
| over the original? 1.5x? 15x?
| anonymoushn wrote:
| As a point of comparison, I think simdjson tape is only ~3x
| larger than the original document. Much of this could be saved
| if one did things like stuffing numbers into only one tape slot
| or not copying strings that have no escape sequences in them
| (instead referencing their location in the original document).
|
| The largest overhead you could achieve is apparently 8x for a
| document that is mostly the characters "[]" or also 8x for a
| document that is mostly the characters "0,"
| halayli wrote:
| There's a ton more context that needs to be held in a clang
| AST node compared to a JSON doc. Just to mention a few: file
| source, token range, expr/stmt/decl statements and their
| relation, macro expansion related info, and tons of diag to
| be able to walk back to the original source token, and
| padding added by the compiler in those data structures.
| cryptonector wrote:
| If you can achieve a memory savings of 30%, say, that would be
| pretty big news. If you can only achieve a 30% memory savings
| by doing things to the compiler that make it harder to work on
| the compiler in the future, then it's probably not worthwhile.
| But if you can achieve an 80% memory savings by doing violence
| to the compiler, that might be worth the trouble.
|
| As for what's an ideal source->AST expansion factor for a
| language that has a user-friendly compiler that's also compiler
| dev-friendly, that's hard to say. Clearly 50x is workable.
|
| In TFA the 50x expansion factor is used as a motivator for
| automating a particular type of optimization. It'd be very
| interesting to see a Rust vec-of-enums that automatically
| deconstructs enum values into tags and opaque values that it
| could store in a struct-of-arrays like TFA does in Zig. The
| places to bury unsafe use into for this would not be many.
| fnord123 wrote:
| Maybe I'm just dumb but isn't the endgame of vec of enums
| going to be an ECS?
| [deleted]
| dan-robertson wrote:
| I think you should just watch the linked talk. It's excellent.
| Though they don't put exact numbers on it iirc (perhaps too
| early to be confident - they might be small due to missing data
| they don't yet realise they'll need).
| zoogeny wrote:
| As an aside and unrelated to the content of the article ... but
| the form of the article, with footnotes in the margin, is really
| useful.
| dist1ll wrote:
| Thanks! It's inspired by this
| https://edwardtufte.github.io/tufte-css/
| ammar2 wrote:
| Another quick question for you, what are you using to make
| those hand-drawn looking diagrams? They look amazing.
| dist1ll wrote:
| I'm using Excalidraw :)
|
| [0] https://excalidraw.com/ [1]
| https://github.com/excalidraw/excalidraw
| ghosty141 wrote:
| Thanks for sharing this! Looks fantastic, I'll definitely
| incorporate this into my own blog once it's finished.
| chrismorgan wrote:
| If you're interested in the general concept of sidenotes
| and labelling figures in the margin, here's my take on it
| for further inspiration:
| https://chrismorgan.info/blog/2019-website/, with
| https://chrismorgan.info/blog/rust-fizzbuzz/ exhibiting
| figures.
|
| Do search around for other sidenote implementations, too. I
| like the markup and presentation I end up with, but you'll
| also find other approaches to mobile support in particular,
| involving things like checkbox hacks to toggle visibility,
| or just flat-out requiring JavaScript.
| chrismorgan wrote:
| Generally speaking: sidenotes are excellent, footnotes are
| fine, endnotes are terrible. On the web, as a typically-not-
| paged medium, "footnotes" practically always actually means
| endnotes.
| [deleted]
___________________________________________________________________
(page generated 2023-09-18 23:01 UTC)