[HN Gopher] Ask HN: A retrofitted C dialect?
___________________________________________________________________
Ask HN: A retrofitted C dialect?
Hi I'm Anqur, a senior software engineer with different backgrounds
where development in C was often an important part of my work. E.g.
1) Game: A Chinese/Vietnam game with C/C++ for making
server/client, Lua for scripting [1]. 2) Embedded systems:
Switch/router with network stack all written in C [2]. 3)
(Networked) file system: Ceph FS client, which is a kernel module.
[3] (I left some unnecessary details in links, but are true
projects I used to work on.) Recently, there's a hot topic about
Rust and C in kernel and a message [4] just draws my attention,
where it talks about the "Rust" experiment in kernel development:
> I'd like to understand what the goal of this Rust "experiment"
is: If we want to fix existing issues with memory safety we need to
do that for existing code and find ways to retrofit it. So for
many years, I keep thinking about having a new C dialect for
retrofitting the problems, but of _C itself_. Sometimes big
systems and software (e.g. OS, browsers, databases) could be made
entirely in different languages like C++, Rust, D, Zig, etc. But
typically, like I slightly mentioned above, making a good
filesystem client requires one to write kernel modules (i.e. to
provide a VFS implementation. I do know FUSE, but I believe it's
better if one could use VFS directly), it's not always feasible to
switch languages. And I still love C, for its unique "bare-bone"
experience: 1) Just talk to the platform, almost all the platforms
speak C. Nothing like Rust's PAL (platform-agnostic layer) is
needed. 2) Just talk to other languages, C is the lingua franca
(except Go needs no libc by default). Not to mention if I want
WebAssembly to talk to Rust, `extern "C"` is need in Rust code. 3)
Just a libc, widely available, write my own data structures
carefully. Since usually one is writing some critical components of
a bigger system in C, it's just okay there are not many choices of
existing libraries to use. 4) I don't need an over-generalized
generics functionality, use of generics is quite limited. So
unlike a few `unsafe` in a safe Rust, I want something like a few
"safe" in an ambient "unsafe" C dialect. But I'm not saying
"unsafe" is good or bad, I'm saying that "don't talk about unsafe
vs safe", it's C itself, you wouldn't say anything is "safe" or
"unsafe" in C. Actually I'm also an expert on implementing
advanced type systems, some of my works include: 1) A row-
polymorphic JavaScript dialect [5]. 2) A tiny theorem prover with
Lean 4 syntax in less than 1K LOC [6]. 3) A Rust dialect with reuse
analysis [7]. Language features like generics, compile-time eval,
trait/typeclass, bidirectional typechecking are trivial for me, I
successfully implemented them above. For the retrofitted C, these
features initially come to my mind: 1) Code generation directly to
C, no LLVM IR, no machine code. 2) Module, like C++20 module, to
eliminate use of headers. 3) Compile-time eval, type-level
computation, like `malloc(int)` is actually a thing. 4) Tactics-
like metaprogramming to generate definitions, acting like type-safe
macros. 5) Quantitative types [8] to track the use of resources
(pointers, FDs). The typechecker tells the user how to insert
`free` in all possible positions, don't do anything like RAII. 6)
Limited lifetime checking, but some people tells me lifetime is not
needed in such a language. Any further insights? Shall I kickstart
such project? Please I need your ideas very much. [1]:
https://vi.wikipedia.org/wiki/V%C3%B5_L%C3%A2m_Truy%E1%BB%81...
[2]: https://e.huawei.com/en/products/optical-access/ma5800 [3]:
https://docs.ceph.com/en/reef/cephfs/ [4]:
https://lore.kernel.org/rust-for-linux/Z7SwcnUzjZYfuJ4-@infr...
[5]: https://github.com/rowscript/rowscript [6]:
https://github.com/anqurvanillapy/TinyLean [7]:
https://github.com/SchrodingerZhu/reussir-lang [8]:
https://bentnib.org/quantitative-type-theory.html
Author : anqurvanillapy
Score : 42 points
Date : 2025-02-22 08:11 UTC (3 days ago)
| leecommamichael wrote:
| We seem to have the same desire for a "cleaned up C." Could you
| say more about how metaprogramming would work? I doubt you want
| to put lifetimes into the type system to any degree. The reason C
| compiles so much quicker than C++ is the lack of features. Every
| feature must be crucial. Modules are crucial to preserving C.
| anqurvanillapy wrote:
| > We seem to have the same desire for a "cleaned up C."
|
| That's so great! But sad that no enough ideas and argument came
| up here. :'(
|
| > How metaprogramming would work?
|
| When it comes to "tactics" in Coq and Lean 4 (i.e. DSL to
| control the typechecker, e.g. declare a new variable), there
| are almost equivalent features like "elaborator reflection" in
| Idris 1/2 [1] (e.g. create some AST nodes and let typechecker
| check if it's okay), and most importantly, in Scala 3 [2], you
| could use `summonXXX` APIs to generate new definitions to the
| compiler (e.g. automatically create an instance for the JSON
| encoding trait, if all fields of a record type is given).
|
| So the idea is like: Expose some typechecker APIs to the user,
| with which one could create well-typed or ready-to-type AST
| nodes during compile time.
|
| [1]: https://docs.idris-
| lang.org/en/latest/elaboratorReflection/e...
|
| [2]: https://docs.scala-
| lang.org/scala3/reference/contextual/deri...
|
| > Lifetime and compilation speed.
|
| Yes exactly, I was considering features from Featherweight Rust
| [3], some subset of it might be partially applied. But yes it
| should be super careful on bringing new features in in case of
| compilation speed.
|
| It's also worth to mention that C compiler itself would do some
| partial "compile-time eval" like constant folding, during
| optimization. I know some techniques [4] to achieve this during
| typechecking, not in another isolated pass, and things like
| incremental compilation and related caching could bring
| benefits here.
|
| [3]: https://dl.acm.org/doi/10.1145/3443420
|
| [4]: https://en.wikipedia.org/wiki/Normalisation_by_evaluation
|
| > Every feature must be crucial.
|
| I want to hear more of your ideas on designing such language
| too, and what's your related context and background for it BTW,
| for my curiosity?
| fithisux wrote:
| C3 to C compiler could be a proposal.
| anqurvanillapy wrote:
| Ah that should be good for source-level compatibility. But I'm
| thinking about extending existing codebase that crosses between
| the kernel and user space, e.g. DPDK, SPDK, FUSE, kernel
| module, etc. Curious that how C3 would be adopted in such
| projects.
| fithisux wrote:
| Start small.
| anqurvanillapy wrote:
| And then? https://github.com/anqurvanillapy/TinyLean
| fithisux wrote:
| Very very interesting for me. I always wanted to do
| something similar for Maude in Golang (Python is not a
| bad choice).
|
| Currently my focus is on data engineering, but I can use
| it as an inspiration.
|
| I talked about C3 to C translator, this is what I said
| start small.
| Rochus wrote:
| There are approaches with at least partly the same goals as you
| mentioned, e.g. Zig. Personally I have been working on my own C
| replacement for some time which meets many of your points (see
| https://github.com/micron-language/specification); but the syntax
| is derived from my Oberon+ language, not from C (even if I use C
| and C++ for decades, I don't think it's a good syntax); it has
| compile-time execution, inlines and generic modules (no need for
| macros or a preprocessor); the current version is minimal, but
| extensions like inheritance, type-bound procedures, Go-like
| interfaces or the finally clause (for a simple RAII or "deferred"
| replacement) are already prepared.
| anqurvanillapy wrote:
| > There are approaches e.g. Zig.
|
| Yes! Zig has done a great job on many C-related stuff, e.g.
| they've already made it possible to cross-compile C/C++
| projects with Zig toolchain years ago. But I'm still quite
| stupidly obsessed with source-level compatibility with C, don't
| know if it's good, but things like "Zig uses `0xAA` on
| debugging undefined memory, not C's traditional `0xCC` byte"
| make me feel Zig is not "bare-bone" enough to the C world.
|
| > Micron and Oberon+ programming language.
|
| They look absolutely cool to me! The syntax looks inspired from
| Lua (`end` marker) and OCaml (`of` keyword), CMIIW. The
| features are pretty nice too. I would look into the design of
| generic modules and inheritance more, since I'm not sure what a
| good extendability feature would look like for the C users.
|
| Well BTW, I found there's only one following in your GitHub
| profile and it's Haoran Xu. Any story in here lol? He's just
| such a genius making a better LuaJIT, a baseline Python JIT and
| a better Python interepreter all happen in real life.
| Rochus wrote:
| > _The syntax looks inspired from Lua (`end` marker) and
| OCaml (`of` keyword), CMIIW_
|
| Oberon+ and Micron are mostly derived from Wirth's Oberon and
| Pascal lineage. Lua inherited many syntax features from
| Modula-2 (yet another Wirth language), and also OCaml
| (accidentally?) shares some keywords with Pascal. If you are
| interested in even more Lua similarities, have a look at
| https://github.com/rochus-keller/Luon, which I published
| recently, but which compiles to LuaJIT and thus serves
| different use-cases than C.
|
| > _I would look into the design of generic modules_
|
| I found generic modules to be a good compromise with
| simplicity in mind; here is an article about some of the
| motivations and findings: https://oberon-
| lang.github.io/2021/07/17/considering-generic...
|
| > _Haoran Xu, making a better LuaJIT_
|
| You mean this project: https://github.com/luajit-
| remake/luajit-remake? This is a very interesting project and
| as it seems development continues after a break for a year.
| woodrowbarlow wrote:
| > source-level compatibility with C
|
| not sure if this is exactly what you meant, but in Zig you
| can #include a C header and then "just" invoke the function.
| no special FFI syntax or typecasting (except rich enums and
| strings). it can produce compatible ASTs for C and Zig.
| SleepyMyroslav wrote:
| I think you don't need any rants but here it goes anyway.
|
| Ditching headers does not solve anything at least if your
| language targets include performance or my beloved example
| Gamedev =) . You will have to consume headers until operating
| systems will not stop using them. It is a people problem not
| language problem.
|
| Big elephants in the room I do not see in your list:
|
| 1) "threading" was bolted onto languages like C and C++ without
| much groundwork. Rust kinda has an idea there but its really
| alien to everything I saw in my entire 20+ career with C++. I am
| not going to try to explain it here to not get downvoted into
| oblivion. Just want you to think that threading has to be natural
| in any language targeting multicore hardware.
|
| 2) "optimization" is not optional. Languages also will have to
| deal with strict aliasing and UB catastrophes. Compilers became
| real AGI of the industry. There are no smart developers
| outsmarting optimizing compilers anymore. You either with the big
| compilers on optimization or your language performance is not
| relevant. Providing even some ways to control optimization is
| something sorely missed every time everything goes boom with a
| minor compiler update.
|
| 3) "hardware". If you need performance you need to go back to
| hardware not hide from it further behind abstract machines. C and
| C++ lack real control of anything hardware did since 1985.
| Performant code really needs to be able to have memory pages and
| cache lines and physical layout controls of machine code. Counter
| arguments that these hardware things are per platform and
| therefore outside of language are not really helping. Because
| they need to be per platform and available in the language.
|
| 4) "libc" is a problem. Most of it being used in newly written
| code has to be escalated straight to bug reporting tool. I used
| to think that C++ stl was going to age better but not anymore.
| Assumptions baked into old APIs are just not there anymore.
|
| I guess it does not sound helpful or positive for any new
| language to deal with those things. I am pretty sure we can kick
| all those cans down the road if our goal is to keep writing
| software compatible with PDP that somehow limps in web browser
| (sorry bad attempt at joking).
| anqurvanillapy wrote:
| Exactly the kind of thoughts and insights I need from more of
| the users. Thank you for pointing out many concerns.
|
| > Headers.
|
| C++20 modules are left unstable and unused in major compilers
| there, but it's a standard. And C is ironically perfect for
| FFI, as I said, almost every programming language speaks C:
| Rust WebAssembly API is extern C, JNI in Java, every scripting
| language, even Go itself talks to OS solely using syscall ABI,
| foreign-function calls are only possible with Cgo. C was not
| just an application/systems language for some sad decades.
|
| > Big elephants.
|
| Since I was in the zoo watching tigers:
|
| Mostly three groups of people are served under a language:
| Application writers, library writers, compiler writers
| (language itself).
|
| I narrowed down and started "small" to see if people writing
| programs crossing kernel and user space would have more
| thoughts about C since it's the only choice. That's also my
| job, I made distributed block device (AWS EBS replacement)
| using SPDK, distributed filesystem (Ceph FS replacement) using
| FUSE, packet introspection module in router using DPDK. I know
| how it feels.
|
| Then for the elephants you mentioned, I see them more fitted
| into a more general library and application development, so
| here we go:
|
| > Threading.
|
| Async Rust is painful, Send + Sync + Pin, long signatures of
| trait bounds, no async runtimes are available in standard
| libraries, endless battles in 3rd party runtimes.
|
| I would prefer Go on such problems. Not saying goroutines and
| channels are perfect (stackful is officially the only choice,
| when goroutine stacks somehow become memory intensive, going
| stackless is only possible with 3rd party event loops), but
| builtin deadlock and race detection win much here. So it just
| crashes on violation, loops on unknown deadlocks, I would
| probably go to this direction.
|
| > Optimization, hardware.
|
| Quite don't understand why these concerns are "concerns" here.
|
| It's the mindset of having more known safer parts in C, like a
| disallow list, rather than under a strong set of rules, like in
| Rust, an allowlist (mark `unsafe` to be nasty). Not making
| everything reasonable, safe and generally smart, which is
| surreal.
|
| C is still, ironically again, the best language to win against
| assembly upon an optimizing performance, if you know these
| stories:
|
| - They increased 30% speed on CPython interpreter recently on
| v3.14.
|
| - The technique was known 3 years ago to be applied in LuaJIT-
| Remake, they remade a Lua interpreter to win against the
| original handwritten assembly version, without inline caching.
|
| - Sub-techniques of it exist more than a decade even it's in
| Haskell LLVM target, and they theoretically exist before C was
| born.
|
| It is essentially just an approach to matching how the real
| abstract machine looks like underneath.
|
| > libc.
|
| Like I said, C is more than a language. Ones need to switch a
| new allocator algorithm upon malloc/free, Rust quits using
| jemalloc by default and uses just malloc instead. Libc is
| somewhat a weird de facto interface.
| SleepyMyroslav wrote:
| I guess I need to illustrate my points a bit because I never
| needed to poke kernels and my concerns are mostly from large
| games. I am trying to imagine writing large games in your
| language so please bear with me for a moment.
|
| >Modules
|
| Nobody plans to provide other interfaces to
| oses/middlewares/large established libraries. Economy is just
| not there.
|
| >Threading
|
| I was not talking about I/O at all. All of that you mention
| will be miles better in any high level language because
| waiting can be done in any language. Using threads for
| computation intensive things is a niche for low level
| languages. I would go further say that copying stuff around
| and mutexes also will be fine in high level languages.
|
| >Optimization/Hardware
|
| Is very important to me. I don't know how it was not relevant
| to your plan of fixing low level language. Here goes couple
| of examples to try to shake things up.
|
| The strlen implementation in glibc is not written in C. UB
| just do not allow to implement the same algorithm. Because
| reading up until memory page end is outside of abstract
| machine. Also note how sanitizers are implemented to avoid
| checking strlen implementation.
|
| Pointer provenance that is both present in each major
| compiler and impossible to define atm. You need to decide if
| your language goes with abstract machine or gcc or clang or
| linux. None of them agree on it. A good attempt to add into C
| standard a logical model of pointer provenance did not
| produced any results. If you want to read up on that there
| was HN thread about it recently.
|
| >libc
|
| I am pretty sure I can't move you on that. Just consider
| platforms that need to use new APIs for everything and have
| horrendous 'never to be used' shims to be posix 'compatible'.
| Like you can compile legacy things but running it does not
| make sense. Games tend to run there just fine because games
| used to write relevant low level code per platform anyway.
| anqurvanillapy wrote:
| > Imagine writing large games in your language.
|
| You don't. Read the features I listed. One ends up with a C
| alternative frontend (Cfront, if you love bad jokes)
| including type system like Zig without any standard
| library. No hash tables, no vectors. You tended to write
| large games with this.
|
| Like I said the main 3 groups of users, if you're concerned
| about application writing, ask it. Rest of the comments
| talked about possible directions of langdev.
|
| > Modules.
|
| You write C++ and don't know what a standard is. Motivating
| examples, real world problems (full and incremental
| compilation, better compilation cache instead of
| precompiled headers), decades spent on discussions. Economy
| would come for projects with modern C++ features.
|
| > Threading.
|
| If you know Rust and Go, talk about them more. Go creates
| tasks and uses futexes, with bare-bone syscall ABI. Higher
| level primitives are easy to use. Tools and runtime are
| friendly to debugging.
|
| I wrote Go components with channels running faster than
| atomics with waits, in a distributed filesystem metadata
| server.
|
| On CPU intensiveness, I would talk about things like
| automatic vectorization, smarter boxing/unboxing, smarter
| memory layout (aka levity, e.g. AoS vs SoA). Not threading
| niche.
|
| > Strlen implementation and plan of low level programming.
|
| Because I keep talking about designing a general purpose
| language. One can also use LLVM IR to implement such
| algorithms.
|
| The design space here is to write these if necessary. Go
| source code is full of assembly.
|
| > Pointer provenance.
|
| Search for Andras Kovacs implementation of 2ltt in ICFP
| 2024 (actually he finished it in 2022), and his dtt-rtcg,
| you would realize how trivial these features could be
| implemented "for a new language". I design new languages.
|
| > libc.
|
| Like I said, your happy new APIs invoke malloc.
| SleepyMyroslav wrote:
| Good luck with metaprogramming. It looks cool.
|
| No worries, I got your message about target audience
| first time. It's just that language development for me is
| where I did some things. Langdev is an open ended
| problem. I wish I could express games needs without
| wasting time on things games don't care about.
| needlesslygrim wrote:
| > Async Rust is painful
|
| On the other hand, I've found normal threading in Rust quite
| simple (generally using a thread pool).
| PaulDavisThe1st wrote:
| > Just want you to think that threading has to be natural in
| any language targeting multicore hardware.
|
| parallel execution and thus parallel programming will never be
| natural to any human being. We don't do it, we can't think it
| except by using various cognitive props (diagrams, objects) to
| help us. You cannot make it natural no matter how strongly you
| desire it.
|
| Now, there is a different sort of "natural" which might mean
| something more like "idiomatic to other language forms and
| patterns", and that's certainly a goal that can widely missed
| or closely approximated.
| pjc50 wrote:
| > So unlike a few `unsafe` in a safe Rust, I want something like
| a few "safe" in an ambient "unsafe" C dialect. But I'm not saying
| "unsafe" is good or bad, I'm saying that "don't talk about unsafe
| vs safe", it's C itself, you wouldn't say anything is "safe" or
| "unsafe" in C.
|
| Eh?
|
| The critical criterion is "does your language make it difficult
| to write accidental RCEs". There's huge resistance to changing
| language _at all_ , as we can see from the kernel mailing lists,
| so in order to go through the huge social pain of encouraging
| people to use a different language it's got to offer real and
| significant benefits.
|
| Lifetimes are a solution to memory leaks and use-after free.
| Other solutions may exist.
|
| Generics: Go tried to resist generics. It was a mistake. You need
| to be able to do Container<T> somehow. Do you have an opinion on
| the dotnet version of generics?
|
| (You mention Ceph: every time I read about it I'm impressed, in
| that it seems an excellent solution to distributed filesystems,
| and yet I don't see it mentioned all that often. I'm glad it's
| survived)
| AlotOfReading wrote:
| The problem with "safe pockets in ambient unsafety" is that C and
| C++ intentionally disallow this model. It doesn't matter what you
| do to enforce safety within the safe block, the definition of
| Undefined Behavior means that code elsewhere in your program can
| violate any guarantees you attempt to enforce. The only ways
| around this are with a language that doesn't transpile to C and
| doesn't have undefined behavior like Rust, or a compiler that
| will translate C safely like zig attempts to do. Note that zig
| still falls short here with unchecked illegal behavior and rustc
| has struggled with assumptions about C's undefined behavior
| propagating into LLVM's backend.
| jjnoakes wrote:
| Safe pockets in ambient unsafety does have benefits though. For
| example, some code has a higher likelihood of containing
| undefined behavior (code that manipulates pointers and offsets
| directly, parsing code, code that deals with complex lifetimes
| and interconnected graphs, etc), so converting just that code
| to safe code would have a high ROI.
|
| And once you get to the point where a large chunk of code is in
| safe pockets, any bugs that smell of undefined behavior only
| require you to look at the code outside of the safe pockets,
| which hopefully decreases over time.
|
| There are also studies that show that newly written code tends
| to have more undefined behavior due to its age, so writing new
| code in safe pockets has a lot of benefit there too.
| mikexstudios wrote:
| Kind of along these lines but for C++: https://docs.carbon-
| lang.dev/
| arnsholt wrote:
| In 2014 John Regehr and colleagues suggested what he called
| Friendly C[0], in an attempt to salvage C from UB. About bit more
| than a year later, he concluded that the project wasn't really
| feasible because people couldn't agree on the details of what
| Friendly C should be.[1]
|
| In the second post, there's an interesting comment towards the
| end:
|
| > Luckily there's an easy away forward, which is to skip the step
| where we try to get consensus. Rather, an influential group such
| as the Android team could create a friendly C dialect and use it
| to build the C code (or at least the security-sensitive C code)
| in their project. My guess is that if they did a good job
| choosing the dialect, others would start to use it, and at some
| point it becomes important enough that the broader compiler
| community can start to help figure out how to better optimize
| Friendly C without breaking its guarantees, and maybe eventually
| the thing even gets standardized. There's precedent for
| organizations providing friendly semantics; Microsoft, for
| example, provides stronger-than-specified semantics for volatile
| variables by default on platforms other than ARM.
|
| I would argue that this has happened, but not quite in the way he
| expected. Google (and others) _has_ chosen a way forward, but
| rather than somehow fixing C they have chosen Rust. And from what
| I see happening in the tech space, I think that trend is going to
| continue: love it or hate it, the future is most likely going to
| be Rust encroaching on C, with C increasinly being relegated to
| the "legacy" status like COBOL and Fortran. In the words of
| Ambassador Kosh: "The avalanche has already started. It is too
| late for the pebbles to vote."
|
| 0: https://blog.regehr.org/archives/1180 1:
| https://blog.regehr.org/archives/1287
| Macha wrote:
| I think the problem with "friendly C", "safe C++" proposals is
| they come from a place of "I want to continue using what I know
| in C/C++ but get some of the safety benefits. I'm willing to
| trade some of the safety benefits for familiarity". The problem
| is the friendly C/safe C++ that people picture from that is on
| a spectrum. On one end you have people that really just want to
| keep writing C++98 or C99 and see this as basically a way to
| keep the network effects of C/C++ by having other people write
| C who wouldn't. The other extreme are people who are willingly
| to significantly rework their codebases to this hypothetical
| safe C.
|
| The people on one end of this spectrum actually wouldn't accept
| any of the changes to meaningfully move the needle, while the
| people on the other end have already moved or are moving to
| Rust.
|
| Then in the middle you have a large group of people but not one
| that agrees on which points of compatibility they will give up
| for which points of safety. If someone just said "Ok, here's
| the standard variant, deal with it", they might adopt it... but
| they wouldn't be the ones invested enough to make it and the
| people who would make it have already moved to other languages.
| awesome_dude wrote:
| > Luckily there's an easy away forward, which is to skip the
| step where we try to get consensus.
|
| This is true, the Benovolant Dictator model, versus the Rule by
| committee model problesm.
|
| Committees are notorius for having problems coming to a
| consensus, because everyone wants to pull in a different
| direction, often at odds with everyone else.
|
| Benevolent dictators get things done, but it's not necessarily
| what people want.
|
| And, we live in hope that they stay benevolent.
| melon_tusk wrote:
| This is a dream come true. Please do it, for the love of mankind.
| vmchale wrote:
| Have a look at ATS, it is memory-safe and designed for kernel
| development. There's a kernel and arduino examples. Fluent C
| interop.
|
| No tactics metaprogramming but it'll give you a start.
| gwbas1c wrote:
| There are plenty of attempts at "safe C-like" languages that you
| can learn from:
|
| C++ has smart pointers. I personally haven't worked with them,
| but you can probably get very close to "safe C" by mostly working
| in C++ with smart pointers. Perhaps there is a way to annotate
| the code (with a .editorconfig) to warn/error when using a
| straight pointer, except within a #pragma?
|
| > Just talk to the platform, almost all the platforms speak C.
| Nothing like Rust's PAL (platform-agnostic layer) is needed. 2)
| Just talk to other languages, C is the lingua franca
|
| C# / .Net tried to do that. Unfortunately, the memory model
| needed to enable garbage collection makes it far too opinionated
| to work in cases where straight C shines. (IE, it's not practical
| to write a kernel in C# / .Net.) The memory model is also so
| opinionated about how garbage collection should work that C# in
| WASM can't use the proposed generalized garbage collector for
| WASM.
|
| Vala is a language that's inspired by C#, but transpiles to C. It
| uses the gobject system under the hood. (I guess gobjects are
| used in some linux GUIs, but I have little experience with it.)
| Gobjects, and thus Vala, are also opinionated about how automatic
| memory management should work, (In this case, they use reference
| counting.), but from what I remember it might be easier to drop
| into C in a Vala project.
|
| Objective C is a decent object-oriented language, and IMO, nicer
| than C++. It allows you to call C directly without needing to
| write bindings; and you can even write straight C functions mixed
| in with Objective C. But, like C# and Vala, Objective C's memory
| model is also opinionated about how memory management should
| work. You might even be able to mix Swift and Objective C, and
| merely use Objective C as a way to turn C code into objects.
|
| ---
|
| The thing is, if you were to try to retrofit a "safe C" inside of
| C, you have to be _opinionated about how memory management should
| work._ The value of C is that it has no opinions about how your
| memory management should work; this allows C to interoperate with
| other languages that allow access to pointers.
| neonsunset wrote:
| It's less so opinionated and more so that WASM GC spec is just
| bad and too rudimentary to be anywhere near enough for more
| sophisticated GC implementations found in JVM and .NET.
| gwbas1c wrote:
| It's been awhile since I skimmed the proposal. What I
| remember is that it was "just enough" to be compatible with
| Javascript; but didn't have the hooks that C# needs. (I don't
| remember any mentions about the JVM.)
|
| I remember that the C# WASM team wanted callbacks for
| destructors and type metadata.
|
| Personally, having spent > 20 years working in C#,
| destructors is a smell of a bigger problem; and really only
| useful for debugging resource leaks. I'd rather turn them off
| in the WASM apps that I'm working on.
|
| Type metadata is another thing that I think could be handled
| within the C# runtime: Much like IntPtr is used to
| encapsulate native pointers, and it can be encapsulated in a
| struct for type safety when working with native code, there
| can be a struct type used for interacting with non-C# WASM
| managed objects that doesn't contain type metadata.
| neonsunset wrote:
| Here's the issue which gives an overview of the problems:
| https://github.com/WebAssembly/gc/issues/77
|
| Further discussion can be found here:
| https://github.com/dotnet/runtime/issues/94420
|
| Turning off destructors will not help even a little because
| the biggest pain points are support for byref pointers and
| insufficient degree of control over object memory layout.
| pkkm wrote:
| I'm a lot less experienced than you, but since you're collecting
| ideas, I'll give my opinion.
|
| For me personally, the biggest improvements that could be made to
| C aren't about advanced type system stuff. They're things that
| are technically simple but backwards compatibility makes them
| difficult in practice. In order of importance:
|
| 1) Get rid of null-terminated strings; introduce native slice and
| buffer types. A slice would be basically _struct { T *ptr, size_t
| count }_ and a buffer would be _struct { T *ptr, size_t count,
| size_t capacity }_ , though with dedicated syntax to make them
| ergonomic - perhaps _T ^slice_ and _T @buffer_. We 'd also want
| buffer -> slice -> pointer decay, _beginof_ / _endof_ / _countof_
| / _capacityof_ operators, and of course good handling of type
| qualifiers.
|
| 2) Get rid of _errno_ in favor of consistent out-of-band error
| handling that would be used in the standard library and
| recommended for user code too. That would probably involve using
| the return value for a status code and writing the actual result
| via a pointer: _int do_stuff(T *result, ...)_.
|
| 3) Get rid of the strict aliasing rule.
|
| 4) Get rid of various tiny sources of UB. For example,
| standardize _realloc_ to be equivalent to _free_ when called with
| a length of 0.
|
| Metaprogramming-wise, my biggest wish would be for a way to
| enrich programs and libraries with custom compile-time checks,
| written in plain procedural code rather than some convoluted
| meta-language. These checks would be very useful for libraries
| that accept custom (non- _printf_ ) format strings, for example.
| An opt-in linear type system would be nice too.
|
| Tool-wise, I wish there was something that could tell me
| definitively whether a particular run of my program executed any
| UB or not. The simpler types of UB, like null pointer
| dereferences and integer overflows, can be detected now, but I'd
| also like to know about any violations of aliasing and pointer
| provenance rules.
| ryao wrote:
| Here is a sound static analyzer that can identify all memory
| safety bugs in C/C++ code, among other kinds of bugs:
|
| https://www.absint.com/astree/index.htm
|
| You can use it to produce code that is semi-formally verified to
| be safe, with no need for extensions. It is used in the aviation
| and nuclear industries. Given that it is used only by industries
| where reliability is so important that money is no object, I
| never bothered to ask them how much it costs. Few people outside
| of those industries knows that it exists. It is a shame that the
| open source alternatives only support subsets of what it
| supports. The computing industry is largely focused on unsound
| approaches that are easier to do, but do not catch all issues.
|
| If you want extensions, here is a version of C that relies on
| hardware features to detect pointer dereferences to the wrong
| places through capabilities:
|
| https://github.com/CTSRD-CHERI/cheri-c-programming
|
| It requires special CHERI hardware, although the hardware does
| exist.
| AlotOfReading wrote:
| Astree is a pain in the butt. Even if it were free, I'd
| recommend it to very few people. It's not usable without
| someone (often a team) being responsible for it full time.
|
| TrustInSoft is the higher quality option, polyspace is the more
| popular option, and IKOS is probably the best open source
| option. I've also had luck with tools from Galois Inc and the
| increasingly dated rv-match tool.
| fxtentacle wrote:
| I believe what programmers actually want is clean dialect-free C
| with sidecar files.
|
| It seems people pretty universally dislike type annotations and
| overly verbose comments, like Ruby's YARD or Java's Javadoc.
| Also, if your new language doesn't compile with a standard C
| compiler, kernel usage is probably DOA. That means you want to
| keep the source code pure C and store additional data in an
| additional file. That additional file would then contain stuff
| like pointer type annotations, object lifecycle and lifetime
| hints, compile-time eval hints, and stuff to make the macros type
| safe. Ideally, your tool can then use the C code and the sidecar
| file together to prove that the C code is bug-free and that
| pointers are handled correctly. That would make your language as
| safe as Rust to use.
|
| The hardcore C kernel folks can then just look at the C code and
| be happy. And you and your users use a special IDE to modify the
| C code and the sidecar file simultaneously, which unlocks all the
| additional language features. But as soon as you hit save, the
| editor converts its internal representation back into plain C
| code. That means, technically, the sidecar file and your IDE are
| a fancy way of transpiling from whatever you come up with to pure
| C.
| muricula wrote:
| You may be interested in the new Clang -fbounds-safety extension
| https://clang.llvm.org/docs/BoundsSafety.html
| bachmeier wrote:
| You mentioned D, but are you familiar with D's BetterC?
|
| https://dlang.org/spec/betterc.html
|
| The goal with BetterC is to write D code that's part of a C
| program. There's no runtime, no garbage collector, or any of
| that. Of course you lose numerous D features, but that's kind of
| the point - get rid of the stuff that doesn't work as part of a C
| program.
| viraptor wrote:
| Here's a thing... There's been many of them and they all die
| because they don't provide enough benefit over the status quo.
| Cyclone
| https://en.wikipedia.org/wiki/Cyclone_(programming_language) is
| probably the most known one. There's Safe C
| https://www.safe-c.org/ A bit further from just "dialect" there's
| OOC https://ooc-lang.github.io/ and Vala https://vala.dev/
|
| But the only thing that really took off was effort to change
| things at the very base level rather than patch issues: Rust,
| Zig, Go.
| ebiederm wrote:
| If the goal is something that can be used to improve existing C
| code, I have a few thoughts.
|
| To get to memory safety with C:
|
| - Add support for array bounds checking. Ideally with the
| compiler doing the heavy lifting and providing to itself that
| most runtime bounds checks are unnecessary.
|
| - Implement trivial dependent types so the compiler can know
| about the array size field that is passed next to a pointer. AKA
|
| void do_something(size_t size, entry_t ptr[size]);
|
| - Enforce the restrict keyword. This is actually the tricky bit.
| I have some ideas for a language that is not C, but making it
| backwards compatible is beyond where I have gotten. My hint is
| separation logic.
|
| - Allow types to change safely. So that free() can change the
| type of the pointer passed to it, to be a non-dereferencable
| pointer (whatever bits it has).
|
| This is an idea from separation logic.
|
| Allowing functions to change types of data safely could also be a
| safe solution to code that needs type punning today.
|
| I think conceptually modules are great, but if your goal is
| source compatible changes that bring memory safety then something
| like modules is an unnecessary distraction.
|
| Any changes that ultimately cannot be implemented in the default
| C compiler I don't think will be preferable to just rewriting the
| code in a more established language like Rust.
|
| On the other hand I think we are in a local maxima with
| programming languages and type systems. With everyone busy
| recombining proven techniques in different ways instead of
| working on the hard problem of how to have assignment, threading,
| and memory safety. Plus how to do proofs of interesting program
| properties with things like asserts.
|
| Unfortunately it appears that only through proof can programs be
| consistent enough that specific security concerns can be said to
| not be problems.
|
| What I have seen of ADA Spark lately has been very tantalizing.
|
| I have a personal project that I think I have solved the memory
| safety problem, while still allowing manual memory management and
| assignment. Unfortunately I am at a stage where everything is
| mostly clear in my head, but I haven't finished fleshing it out
| and proving the type system, so I really can't share it yet :-(.
|
| While implementing modules, memory safety, type variables, and
| functions that can change the types of their argument pointers. I
| think I will end up with something simpler than C in most
| respects.
|
| I keep going well that doesn't make any sense today, as I go
| through all of the details and ask why is something done the way
| it is done.
|
| One of those questions is why doesn't C use modules.
| 1718627440 wrote:
| In my opinion SPLint (http://splint.org/) would be a nice
| approach. It is a way to specify ownership semantics, inout
| parameters etc., but also allows to specify arbitrary pre- and
| postconditions. It works by annotating whole functions, their
| parameters, types and variables. These are then checked by
| calling splint on the codebase, you can also opt out of several
| checks by flags or using the preprocessor. -
| nullability: /*@null@*/ - in/out parameter (default in):
| /*@inout@*/, /*@out@*/ - ownership: /*@only@*/, /*@temp@*/,
| /*@shared@*/, /*@refcounted@*/ - also supports partial
| defined parameters - allows to be introduced gradually in
| the codebase
|
| Example from the documentation: void * /*@alt
| char * @*/ strcpy (/*@unique@*/ /*@out@*/ /*@returned@*/
| char *s1, char *s2) /*@modifies *s1@*/
| /*@requires maxSet(s1) >= maxRead(s2) @*/
| /*@ensures maxRead(s1) == maxRead (s2) @*/;
|
| My main problem was that it was annoying to add to a project, but
| that is only because you need to specify ownership semantic, not
| because of the syntax which is short and readable, and that the
| program is sometimes crashing and there doesn't seem to be active
| development.
___________________________________________________________________
(page generated 2025-02-25 23:01 UTC)