_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
(HTM) Visit Hacker News on the Web
COMMENT PAGE FOR:
(HTM) Show HN: A small programming language where everything is pass-by-value
electroly wrote 32 min ago:
My hobby language[1] also has no reference semantics. I think this is a
really interesting point in the design space. A lot of complexity goes
away when it's only values, and there are real languages like classic
APL that work this way. But there are some serious downsides.
In practice I have found that it's very painful to thread state through
your program. I ended up offering global variables, which provide
something similar but worse than generalized reference semantics. My
language aims for simplicity so I think this may still be a good
tradeoff, but it's tricky to imagine this working well in a larger user
codebase.
I like that having only value semantics allows me, internally, to use
reference counted immutable objects to cut down on copying; you can
pass-by-reference internally and present it as pass-by-value to the
programmer. No cycle detection needed because it's not possible to
construct cycles. I use an immutable data structures library[2] so that
modifications are reasonably efficient. I recommend trying that in
Herd; it's better than copy-on-write. Think about the Big-O of
modifying a single element in a massive array, or building up a list by
repeatedly appending to it. With pure COW it's hard to have a large
array at all--it takes too long to build it!
But for the programmer, this, too, can actually be a negative.
Sometimes people want circular linked lists, or to implement custom
data structures. It's tough to build new data structures in a language
without reference semantics. For the most part, the programmer has to
simulate them with arrays. This works for APL because it's an array
language, but my BASIC has less of an excuse. [1]
(HTM) [1]: https://tmbasic.com
(HTM) [2]: https://github.com/arximboldi/immer
jasperry wrote 1 hour 7 min ago:
Syntax comment: in your control structures you use a keyword ("do",
"then") to start a block as well as wrapping the block in parentheses.
This feels superfluous. I suggest sticking with either keywords or
parens to delineate blocks, not both.
anacrolix wrote 2 hours 40 min ago:
Nobody has heard of persistent data structures?!
tromp wrote 5 hours 19 min ago:
This sounds quite similar to pure functional languages like Haskell,
where a function call cannot have any side effect.
But those go further in that they don't even have any mutable data.
Instead of
var foo = { a: 1 };
var bar = foo; // make a copy of foo
set bar.a = 2; // modify bar (makes a copy)
Haskell has
foo = Baz { a = 1 }
bar = foo { a = 2 } // make a modified copy of foo
jcparkyn wrote 4 hours 48 min ago:
Personally I think local mutability is quite a useful property, which
was part of the inspiration for making this instead of just another
pure functional language:
- All functions are still referentially transparent, which means we
get all the local reasoning benefits of pure functions.
- We can mutate local variables inside loops (instead of just
shadowing bindings), which makes certain things a lot easier to write
(especially for beginners).
- Mutating nested fields is super easy: `set foo.bar[0].baz = 1;`
(compare this to the equivalent Haskell).
tylerhou wrote 7 hours 51 min ago:
You should check out Perceus!
(HTM) [1]: https://www.microsoft.com/en-us/research/wp-content/uploads/20...
Panzerschrek wrote 7 hours 52 min ago:
But what if mutation is intended? How to pass a mutable reference into
a function, so that it can change the underlying value and the caller
can observe these changes? What about concurrent mutable containers?
jcparkyn wrote 4 hours 16 min ago:
> How to pass a mutable reference into a function, so that it can
change the underlying value and the caller can observe these changes?
Just modify the value inside the function and return it, then assign
back. This is what the |= syntax is designed for. It's a bit more
verbose than passing mutable references to functions but it's
actually functionally equivalent.
Herd has some optimisations so that in many cases this won't even
require any copies.
> What about concurrent mutable containers?
I've considered adding these, but right now they don't exist in Herd.
netbioserror wrote 8 hours 42 min ago:
Nim has a similar, strong preference for value semantics. However, its
dynamic heap types (strings, seqs, tables) are all implemented as
wrappers that hide the internal references and behave with value
semantics by default, unless explicitly escape hatched. It makes it
incredibly easy to manipulate almost any data in a functional,
expression-oriented manner, while preserving the speed and efficiency
of being backed by a doubling array-list.
travisgriggs wrote 10 hours 21 min ago:
Curious if erlang/elixir isnât the same sort of thing? Or am I
misunderstanding the semantics of âpass by valueâ?
throwaway17_17 wrote 7 hours 52 min ago:
Your assumption is somewhat correct, for both Erlang and Elixir,
however the phrase under discussion doesnât mean the same thing for
immutable languages. Both are âpass-by-valueâ but that term is
being overloaded in a particular way. As I said in another comment,
âvalueâ in the language from TFA means any object that IS NOT a
reference. The qualifier that every semantic object is a âvalueâ
and that therefore, all arguments to a function call, threads spawn,
etc are independent values which are (logically, at least) copied to
new values that are then passed to the new context.
However, for Erlang and Elixir âpass-by-valueâ is otherwise
called âcall-by-valueâ. In this case, it is a statement that
arguments to functions are evaluated before they are passed into the
function (often at the call site). This is in opposition to
âcall-by-name/needâ (yes, I know they arenât the same) which
is, for instance, how Haskell does it for sure, and I think Python is
actually âby-nameâ as well.
So, Herdâs usage here is a statement of semantic defaults (and the
benefits/drawbacks that follow from those defaults) for arguments to
functions, and Elixirâs usage is about the evaluation order of
arguments to functions, they really arenât talking about the same
thing.
Interestingly, this is also a pair of separate things, which are both
separate from what another commenter was pedantically pointing out
elsewhere in the thread. Programming language discussion really does
seem to have a mess of terminology to deal with.
zem wrote 11 hours 15 min ago:
the pipe-equal operator is pretty neat, don't think I've seen any other
language do that.
throwaway17_17 wrote 7 hours 40 min ago:
I wrote the following and then realized maybe it is just a quirk of
the example in the reader that the âsetâ/â=â pair comes at
the end of the chain. If so, it is just a unique syntax sugar for a
function, I donât think it is, so Iâm leaving my comment as I
wrote it letting this act as a caveat:
Although I donât particularly like the â|â to be used for
chaining functions, I certainly know that it has been a long term
syntax coming from Unix. My only issue with the â|=â is that it
should be unnecessary. The only reason I can see that the special
operator is required is that the âsetâ/â=â syntax pair is a
non-functional keyword âsetâ with an (I think) overloaded keyword
â=â. If the equal sign was an ordinary function (i.e. a function
that take a value, and an identifier, associates the value and the
identifier, then returns the new value like the Lisps and derived
lands) it could just be used arbitrarily in chains of functions.
augusteo wrote 12 hours 24 min ago:
The threading story here is what grabbed my attention. Pass-by-value
with copy-on-write means you get data-race immunity without any locks
or channels. You just pass data to a thread and mutations stay local.
That's a genuinely useful property.
I've worked on systems where we spent more time reasoning about shared
state than writing actual logic. The typical answer is "just make
everything immutable" but then you lose convenient imperative syntax.
This sits in an interesting middle ground.
Curious about performance in practice. Copy-on-write is great until you
hit a hot path that triggers lots of copies. Have you benchmarked any
real workloads?
zahlman wrote 10 hours 39 min ago:
Why exactly is imperative syntax "convenient" specifically in the
context of inter-thread communication?
ddtaylor wrote 10 hours 10 min ago:
He's likely referencing that you would need to use different syntax
and style, like re-assigning a variable or chaining calls, like
when working with a String in Java.
In C, you can simply mutate the underlying characters. So changing
the fifth character in a string is as easy as:
str[4] = 0;
Whereas using the immutable syntax you might have to do something
like:
str = str.substr(0, 4) + "\0" + str.substr(4);
zahlman wrote 9 hours 38 min ago:
Well, yes, that's how it becomes convenient in general. But why
would you be doing things like that when communicating between
threads?
rao-v wrote 11 hours 19 min ago:
Why donât we just do this by default for threading in most
languages? Itâs pretty rare for me to actually want to do memory
sharing while threading (mostly because of the complexity)
vbezhenar wrote 7 hours 10 min ago:
Because it's super slow and shared memory is super fast. And people
generally prefer fast code rather than safe code.
gf000 wrote 1 hour 28 min ago:
It's not "super slow" and most languages do something very
similar within concurrent data structures.
Also, copy by value in itself is just a semantic requirement, it
doesn't say how it's implemented.
And shared mutable memory is pretty damn slow (given you are not
fine with data race garbage), because atomic operations destroy
caches. So it's the usual space-time tradeoff at the end of the
day.
jagged-chisel wrote 11 hours 28 min ago:
Pass-by-value is already a copy.
jcparkyn wrote 11 hours 25 min ago:
It's only semantically a pass-by-value, in reality a reference is
passed and the data is only copied if needed (i.e. value is mutated
while shared).
zahlman wrote 10 hours 36 min ago:
So the language has reference semantics, and (per the edit) for
every object (like in Python)?
(Ah, no, your example elsewhere in the thread suggests that the
referred-to structures get implicitly copied all over the place.)
jcparkyn wrote 10 hours 20 min ago:
Nope, it's value semantics (unlike Python), the references are
just an internal optimization. Implicit copies happen when a
list/dict with more than one reference is mutated.
zahlman wrote 9 hours 36 min ago:
> the references are just an internal optimization
Optimization specifically for function calls, or... ?
Because if you're doing all this copy-on-write anyway, the
indirection seems to be a needless cost in other contexts.
jcparkyn wrote 4 hours 27 min ago:
This applies everywhere, and it fundamentally wouldn't be
possible for just function calls.
> needless cost
Are you comparing to a language with mutable references or
a our functional language? A language with mutable
references will of course be faster, but this is more
intended as an alternative to pure functional languages
(since functions are referentially transparent).
In this case, the cost of the indirection is approximately
zero (relative to the cost of just doing reference
counting), since passing a reference just requires a bump
to the refcount. And most of the time the refcount
increments are skipped by "moving" instead of copying the
reference.
jcparkyn wrote 11 hours 29 min ago:
> Have you benchmarked any real workloads?
Nothing "real", just the synthetic benchmarks in the ./benchmarks
dir.
Unnecessary copies are definitely a risk, and there's certain code
patterns that are much harder for my interpreter to detect and
remove. E.g. the nbodies has lots of modifications to dicts stored in
arrays, and is also the only benchmark that loses to Python.
The other big performance limitation with my implementation is just
the cost of atomic reference counting, and that's the main tradeoff
versus deep cloning to pass between threads. There would definitely
be room to improve this with better reference counting optimizations
though.
wging wrote 9 hours 56 min ago:
There is some prior work on mitigating the performance cost of
immutability that you might be interested in. For example,
Clojure's persistent vectors allow fast modifications without
destroying the original vector, because internally they're wide
trees rather than just linear arrays of memory. This allows for
assignments to be implemented without a copy of the full vector.
(HTM) [1]: https://hypirion.com/musings/understanding-persistent-vect...
sheepscreek wrote 11 hours 55 min ago:
Hmm this is a bit like peeling a banana only to throw the banana and
eat the peel. Pass by value reduces the true benefit of
copy-on-write.
Use immutable pass by reference. Make a copy only if mutability is
requested in the thread. This makes concurrent reads lock-free but
also cuts down on memory allocations.
postepowanieadm wrote 56 min ago:
Peels are rich in fiber.
doug-moen wrote 11 hours 31 min ago:
I think that what you are calling "immutable pass by reference" is
what the OP is calling "pass by value". See, when used abstractly,
"pass by value" means that the argument is passed as a value, hence
it is immutable and the callee can't mutate it. One way to
implement this is by copying the data that represents the value. In
the OP's language, and in many other languages that work this way,
instead of copying the data, we implement "pass by value" by
incrementing the reference count and passing a pointer to the
original data. These differing implementations provide the same
abstract semantics, but differ in performance.
jcparkyn wrote 11 hours 35 min ago:
> Use immutable pass by reference. Make a copy only if mutability
is requested in the thread.
This is essentially what Herd does. It's only semantically a pass
by value, but the same reference counting optimizations still
apply.
In fact, Herd's approach is a bit more powerful than this because
(in theory) it can remove the copy entirely if the caller doesn't
use the old value any more after creating the thread. In practice,
my optimizations aren't perfect and the language won't always
detect this.
The big downside is that we have to use atomic reference counts for
_everything_. From memory this was about a 5-15% performance hit
versus non-atomic counters, though the number might be higher if
other bottlenecks were removed.
jbritton wrote 12 hours 27 min ago:
The article mentions shallow copy, but does this create a persistent
immutable data structure? Does it modify all nodes up the tree to the
root?
jcparkyn wrote 11 hours 56 min ago:
Yes, if you modify a nested dict/list entry, all nodes above it will
be cloned. Here's an example:
x = [1, [2]];
var y = x;
set y.[0] = 3; // clones the outer array, keeps a reference to
inner array
set y.[1].[0] = 4; // clones the inner array here. Outer array is
now exclusive so it doesn't need another clone.
var z = x;
set z.[1].[0] = 4; // clones both arrays at once
drnick1 wrote 12 hours 58 min ago:
Small programming language with everything passed by value? You
reinvented C?
jcparkyn wrote 12 hours 49 min ago:
Not everything in C is pass-by-value. Sure, you can argue that a
pointer itself is passed by value, but the data it points to is
definitely not.
globalnode wrote 11 hours 35 min ago:
cool project. can you take the address of a variable in some way?
i.e. implement your own pointers if its really really needed?
jcparkyn wrote 10 hours 9 min ago:
> can you take the address of a variable in some way?
I intentionally didn't add this, mostly because I wanted to
explore how far you can get without it (and keep the language
simple). Having a "real" pointer as a first class type wouldn't
work though, since it would break a lot of the assumptions I use
for optimizations.
I did think about two different versions of this but didn't end
up adding either:
- Something like `inout` parameters in Swift, which aren't first
class pointers. This is really just an alternate syntax for
returning multiple values.
- A "ref" type, which is essentially a mutable container for an
arbitrary value. Passing the container around would share a
reference to the same mutable value. This still wouldn't allow
modifying values "outside" of the container though.
globalnode wrote 7 hours 47 min ago:
you're right, once indirection appears pandoras box opens up.
keep it as pass by value only, it makes the language unique.
although the desire for pointers will never go away, people
have used them for so long now.
throwaway17_17 wrote 7 hours 30 min ago:
I think your statement about familiarity is spot on. One of
the âpromisesâ of the early functional language efforts
(late 70s through mid 90s) was that the invariants made it
conceptually simple to have a magical, super compiler take
the code and produce binary executables that would surpass
imperative languages, due to the aforementioned Pandoraâs
box effect of the non-functional constructs. Universal value
semantics are similar, especially without leaky abstractions
allowing âplease just let me crack the box, Iâll be good
and wonât abuse the powerâ. Commitment to the invariants,
in this case purely (logically, at least) value semantics
across the entire language could produce executables that
approach low-level, less restrictive code results with a
much, much lower burden on the programmer.
fjfaase wrote 13 hours 1 min ago:
I have implemented similar behavior in some of my projects. For one, I
also have also implemented 'cursors' that point to some part of a value
bound to a variable and allow you to change that part of the value of
the variable. I have used this to implement program transformations on
abstract parse (syntax) trees [1]. I also have implemented a dictionary
based on a tree where only part of the tree is modified that needs to
be modified [2]. I have also started working on a language that is
based on this, but also attempts to add references with defined
behavior [3]. [1] [2]
(HTM) [1]: https://github.com/FransFaase/IParse/?tab=readme-ov-file#markd...
(HTM) [2]: https://www.iwriteiam.nl/D1801.html#7
(HTM) [3]: https://github.com/FransFaase/DataLang
rvba wrote 13 hours 28 min ago:
> In herd, everything is immutable unless declared with var
So basucally everything is var?
jcparkyn wrote 13 hours 13 min ago:
I'm not sure if I understand the question?
There are two ways to define a variable binding:
x = 1; // declares x as immutable
var y = 2; // declares y as mutable
The "default" behaviour (if no keyword is used) is to define a new
immutable variable.
ekipan wrote 13 hours 29 min ago:
(Edit: in the old post title:) "everything is a value" is not very
informative. That's true of most languages nowadays. Maybe "exclusively
call-by-value" or "without reference types."
I've only read the first couple paragraphs so far but the idea reminds
me of a shareware language I tinkered with years ago in my youth,
though I never wrote anything of substance: Euphoria (though nowadays
it looks like there's an OpenEuphoria). It had only two fundamental
types. (1) The atom: a possibly floating point number, and (2) the
sequence: a list of zero or more atoms and sequences. Strings in
particular are just sequences of codepoint atoms.
It had a notion of "type"s which were functions that returned a boolean
1 only if given a valid value for the type being defined. I presume it
used byte packing and copy-on-write or whatever for its speed boasts.
[1] -
(HTM) [1]: https://openeuphoria.org/
(HTM) [2]: https://rapideuphoria.com/
jcparkyn wrote 12 hours 36 min ago:
Thanks, I updated the post title based on this and another comment.
Thanks for the pointer to Euphoria too, looks like an interesting
language with a lot of similar ideas.
p1necone wrote 13 hours 13 min ago:
> It had a notion of "type"s which were functions that returned a
boolean 1 only if given a valid value for the type being defined.
I've got a hobby language that combines this with compile time code
execution to get static typing - or I should say that's the plan,
it's really just a tokenizer and half of a parser at the moment - I
should get back to it.
The cool side effect of this is that properly validating dynamic
values at runtime is just as ergonomic as casting - you just call the
type function on the value at runtime.
discarded1023 wrote 13 hours 31 min ago:
At the risk of telling you what you already know and/or did not mean to
say: not everything can be a value. If everything is a value then no
computation (reduction) is possible. Why? Because computation stops at
values. This is traditional programming language/lambda calculus
nomenclature and dogma. See Plotkin's classic work on PCF (~ 1975) for
instance; Winskel's semantics text (~ 1990) is more approachable.
Things of course become a lot more fun with concurrency.
Now if you want a language where all the data thingies are immutable
values and effects are somewhat tamed but types aren't too fancy etc.
try looking at Milner's classic Standard ML (late 1970s, effectively
frozen in 1997). It has all you dream of and more.
In any case keep having fun and don't get too bogged in syntax.
DemocracyFTW2 wrote 5 hours 39 min ago:
Excuse me if I didn't get it right, but as a practical example, I'd
assume that I can rewrite every program into JavaScript using the
usual control structures and, beyond that, nothing but string values
(which are immutable). Simple arithmetic would already be kind of a
chore but can be done. (Input and output already happens only via
serialized values (cf also webworkers) so there's that; for
convenience use TypedArrays wrapped in classes that shield you from
immutability). It is not obvious to me where `a = '[1,2]'; a1 =
JSON.stringify( JSON.parse( a ).push( 3 ) ) );` is fundamentally
different from just pushing a value to a copy of `a`. Also, you could
write `a1 = a.slice(0,-1) + '3]'` which only uses non-mutating stuff
under the hood.
bayesnet wrote 10 hours 18 min ago:
IMHO this is both unnecessarily pedantic and not really quite right.
Letâs say we accept the premise that âeverything is a valueâ
means reduction is impossible. But a value is just the result of
reducing a term until it is irreducible (a normal form). So if there
is no reduction there canât really be values eitherâthere is just
âproseâ (syntax) and you might as well read a book.
doug-moen wrote 12 hours 56 min ago:
I am unable to extract any meaning from your post. You appear to be
making a general claim: it is impossible to design a programming
language where everything is a value. You at least admit that "data
thingies" can be values. Are you claiming that it is not possible for
functions to be values? (If we assume that the argument and the
result of a function call is a value, then this would mean higher
order functions are impossible, for example.) If not that, then what?
Please give a specific example of something that can never be a value
in any programming language that I care to design.
gf000 wrote 12 hours 4 min ago:
I think parent means it from a lambda calculus perspective. If you
only have values at an AST level, then you only have a tree of..
values, like an XML document.
You can apply meaning to a particular shape of that tree which
could be executed, but then you basically just added another layer
before you parse your AST that becomes executable.
jcparkyn wrote 13 hours 17 min ago:
Thanks, some interesting reading there that I will check out (I
wasn't aware of PCF). Perhaps I should've used more precise wording:
"All types are value types".
> Standard ML [...] It has all you dream of and more
The main thing here that's missing in Standard ML (and most other
functional languages) is the "mutable" part of "mutable value
semantics" - i.e., the ability to modify variables in-place (even
nested parts of complex structures) without affecting copies. This is
different from "shadowing" a binding with a different value, since it
works in loops etc.
throwaway17_17 wrote 8 hours 4 min ago:
Quick note then a more wordy response (and after being dinged in
another thread yesterday, the tl;dr is your usage is correct,
ignore purposely avoiding context criticism of wording):
SML has mutation, but only for Ref Cells, which humorously are
values themselves. Not thatâs what youâre really talking about
here.
Now for the wordy partâ¦
As another of your sibling commenters said GP was being incredibly
pedantic. While his reference to Plotkin/Winskel and the PCF thread
of research is formative, it is of particular note for Structural
Operational Semantics.
The real issue GP is raising that in programming language semantics
there are two distinct ways in which the term âvalueâ is used.
Worse still is that the terms are not just distinct but are broadly
from two very distinct fields in PLT. So, what are the two uses:
1) when the domain of discourse is Operational Semantics of
Programming Languages, a âvalueâ is a term in the formal
abstract syntax of the language which is not subject to reduction
via any other transition defined over the syntax;
2) when the domain of discourse is the Evaluation and Default
Kinds of arguments passed to functions and the semantic
implications of those defaults, a âvalueâ is defined in the
negative as those semantic objects which ARE NOT references [1];
which from the definitions alone, it is clear your designation of
your language as âpass-by-valueâ is a distinct thing from
GPâs usage of the term.
While GPâs usage of âvalueâ is in line with a long standing
tradition, that tradition is firmly within the academic/functional
programming language syntax and semantics sphere. Your usage, as is
PLAINLY APPARENT from context, is absolutely correct and in line
with long standing usage within discussion (and academic work) on
imperative (and otherwise non-functional) programming language
semantics. So keep phrasing discussions about Herd using the
âpass-by-valueâ and âeverything is a valueâ. Itâs not
only contextually correct and historically justified, it is utterly
within the âvernacularâ of normal programming language
discussions.
One last thought, your languageâs adoption of totally defaulting
to passing arguments (to functions, threads, basically any control
construct), with copy-on-write being an optimization only, should
make implementing a non-reified linearity qualifier on types
relatively simple to implement, that would address some of the
locking anti-optimizations and address some of your static analysis
that you mentioned where not working 100%.
âââââââ
1: Reference here include Rust/C++ style lightly abstracted
references, nearly zero abstraction pointers, and then the
references as discussed when talking about Python, Ruby,
JavaScript, C++ smart pointers, Rustâs higher level abstractions
over references, etc which are a very abstract concept of
reference.
(DIR) <- back to front page