[HN Gopher] Using {blocks} in Rust and Go for fun and profit
___________________________________________________________________
Using {blocks} in Rust and Go for fun and profit
Author : surprisetalk
Score : 106 points
Date : 2023-01-25 13:24 UTC (9 hours ago)
(HTM) web link (taylor.town)
(TXT) w3m dump (taylor.town)
| infogulch wrote:
| This is one of my favorite features of rust, especially since you
| can return a value from a block since the language is 'expression
| oriented'.
| mirekrusin wrote:
| Yes, not having everything-is-expression in js hurts.
| indeyets wrote:
| There's a dormant proposal about this:
| https://github.com/tc39/proposal-do-expressions
| allo37 wrote:
| I do this quite a bit in C and C++; I find it's a great way to
| reduce the mental effort required to understand a long function
| without having to jump around the source like when it is broken
| up into multiple functions.
|
| In C++ you can really (ab)use it to do things like scoped mutex
| locks and "stopwatches" that start a timer on construction and
| print the elapsed time on destruction.
|
| Some people find it a bit bizarre though, to each his/her own I
| guess.
| dkarl wrote:
| If your C++ codebase uses RAII to manage locks and other
| expensive resources, I think it's fine to use blocks to define
| their scope. The alternative is to encapsulate the scope in a
| new function or method, which is great if it improves
| readability, but not if it reduces readability.
| DC-3 wrote:
| One of the most common usages I find is to restrict the scope of
| RAII locks such as Mutexes.
| teeray wrote:
| I've tried doing this a few times in a legacy codebase to contain
| the definition of `err` when it changes type (this team did not
| believe in the error interface for awhile).
|
| The problem is that it doesn't stand out visually, and it's
| uncommon, so less-experienced team members would have difficulty
| comprehending what's going on. In the end we just opted to use
| two different variables for the two types of error.
| ajb wrote:
| This feature is also there in C and C++ and probably other
| languages. The catch is that js seems to have inherited the
| syntax but not the scoping rules.
| papercrane wrote:
| It's in Java and D as well, which likely inherited it from C.
| I'm pretty sure C inherited it from ALGOL's "BEGIN ... END"
| blocks.
|
| The catch with JS is that variable scoping rules are different
| when you use 'var', if you use 'let' and 'const' the scoping
| rules work like most programmers would expect for block
| statements.
| pdpi wrote:
| `var` has weird scoping rules, but you can get the "right"
| behaviour in modern javascript, using `let`.
| a_humean wrote:
| Has that not been fixed since ES2015? const a =
| 5; console.log(a) { const a = 10;
| console.log(a) } console.log(a) 5
| 10 5
|
| What we are missing from say Rust in js is the blocks being
| expressions, though there is a proposal ("do expressions") to
| allow this: const a = do { if (b) { 5 }
| else { 10 } };
|
| https://github.com/tc39/proposal-do-expressions
| throwaway90650 wrote:
| Try it with var
| arp242 wrote:
| s/var/let/g and you're good. This is the entire reason
| "let" was added in the first place, to correct this
| historical mistake. It's supported pretty much everywhere
| even by extremely conservative standards, and it's even the
| same number of letters. Literally no reason not to use it.
|
| I personally dislike JavaScript. Even modern JavaScript.
| But this is a bullshit reason to hate on JavaScript.
| a_humean wrote:
| Fortunately ES2015 was 7+ years ago, so no I won't. :)
| throwaway894345 wrote:
| Every few years I'll jump into JavaScript for one thing
| or another, and recently I jumped back into TypeScript
| and holy cow it's shaping up to be a really nice
| language. I really like `const` as well as TypeScript's
| concept of type widening/narrowing (`"foo"` is its own
| type as a specific string but can be widened to
| `string`), which allows the compiler to know that
| `document.createElement("table")` returns an
| `HTMLTableElement` rather than `HTMLElement`--this could
| have been avoided by having a
| `document.createTableElement()` method with its own
| signature, but given that it has to work with older,
| dynamically typed APIs, this is a pretty elegant
| solution.
|
| Similarly, if I have a discriminant union `type FooBar =
| "foo" | "bar";`, TypeScript seems to know that `if
| (["foo", "bar"].includes(x)) {...}` exhaustively handles
| all permutations of `FooBar` (no need for a `switch`
| statement with explicit arms).
|
| The static typing really helps me avoid a bunch of
| "undefined is not a function" stuff that I would waste
| time with in JavaScript.
|
| Pretty cool stuff!
| shadowgovt wrote:
| TypeScript's type language is extraordinarily powerful.
| It's completely reframed the way I do web development; I
| tend to do much more functional and less method-based
| semantics these days because interfaces and generic types
| make that feasible without going mad from losing track of
| what functions can be applied to what data (and receiving
| no help from the very lax type semantics and runtime of
| regular JavaScript).
| kelnos wrote:
| Well, yeah, you can do this in pretty much any language that has
| delineated scopes like that.
|
| Of course, blocks-are-expressions is necessary if you want to
| return the result of your computations from the block and store
| it in a variable outside the block. (You can of course declare
| the storage variable outside the block and assign to it inside,
| but that's less nice.)
|
| GNU C even has an extension called "statement expressions" where
| blocks _can_ return values; the syntax looks like this:
| int foo = ({ int bar = 4 * 3; bar;
| });
|
| Clang implements it in addition to GCC, as do a few other
| compilers. (Notably, IIRC, MSVC does not.)
| rcme wrote:
| Some of the error handling examples in Go are unnecessary as Go
| allows you to access variable defined in your if-statement in
| other branches. For instance: if result, err :=
| something(); err == nil { if result.RowsAffected() ==
| 0 { return nil } } else if
| err != nil { return err }
| xigoi wrote:
| What's the point of the "if err != nil" after the "else"?
| cmontella wrote:
| Best use I've found for this is to make sure borrowed a Rc is
| dropped when you want it to be. Otherwise I've run into "this
| thing is already borrowed" errors when the runtime decided the
| thing should still be borrowed by something else.
| xiphias2 wrote:
| Functions do the same thing, but are more reaadable due to
| explicit naming.
|
| It can be great though as an intermediate step to extracting
| functions.
| wongarsu wrote:
| Functions fulfill the same function, but they move the code to
| a different place. Sometimes that's desirable for readability,
| sometimes it's more readable to keep things inline. It's nice
| to have both options
| a_humean wrote:
| These blocks have other advantages such as having access to the
| outer scope. Yes those could be passed in as arugments to a
| function, but I think there are many cases where that extra
| overhead feels unreasonable given you have already gathered all
| of the values here for this purpose anyway.
| epidemian wrote:
| Yep. And they also keep single-use code local to the only
| place it's being used at the moment, which i think can help
| the readability and maintainability of said code :)
|
| Functions do have a place too of course; even single-use
| ones. Especially when you can give them a clear purpose and
| name.
| Pulcinella wrote:
| A sort of reverse or corollary to DRY, "don't be
| unintentionally repeatable."
| Jtsummers wrote:
| It's more of an extension to "Don't use globals", but at
| a more fine-grained level. Code using this can still be
| DRY, if you understand DRY to be "don't repeat the same
| information excessively". Reusing the same variable name
| in multiple (similar or dissimilar) contexts is not a DRY
| violation in that sense if the names are more a
| coincidence than actual shared information.
|
| Most programmers have no problem, for instance, with
| seeing multiple loops declared like:
| for(int i = ...; ...; ...) { ... }
|
| `i` is "repeated", but there's no objection in having
| multiple declarations since each has a meaning dependent
| on its local context.
| arp242 wrote:
| I like this for some things where the logic is somewhat large-
| ish, but also strongly coupled. A good example is setting up
| state for integration tests; I'm not going to re-use any
| functions I create for that. It's not _bad_ to use functions, I
| just find it more convenient to keep it all in one function,
| but split out a little bit in these faux-subfunctions.
|
| I don't use it often, but when I do, I find it convenient.
| jerf wrote:
| "A good example is setting up state for integration tests;
| I'm not going to re-use any functions I create for that."
|
| And if it turns out I'm wrong about that, the braces provide
| a very nice, guaranteed cutting point in the future that
| anyone can understand without having to load all the context
| of the function.
|
| This isn't the sort of thing I want splattered all over a
| code base, but it is a nice niche tool.
| jotaen wrote:
| A somewhat related technique which I often find useful is
| something that's known as "immediately invoked function
| expressions" (IIFE) in JavaScript. That also creates a sub-scope
| in place, but it let's you return values to the enclosing scope.
| E.g.: result := func() string {
| helperVar1 := //... helperVar2 := //...
| return helperVar1 + helperVar2 }()
|
| So it's basically an anonymous function that is invoked right
| away. You could achieve the same scope separation by pulling it
| out as named function, but sometimes I like it better to keep
| things closer together.
| throwaway894345 wrote:
| This is really helpful in Go for `defer`. For example, if I'm
| manipulating files in a loop, I don't want to do:
| for _, fileName := range fileNames { f, err :=
| os.Open(fileName) if err != nil {
| return err } defer f.Close()
| doSomething(f) }
|
| ... because I might run out of quota for open file handles. I
| want the defer to trigger at the end of the loop rather than at
| the end of the function, so I'll often put a closure in the
| loop body: for _, fileName := range fileNames
| { if err := func() error { f, err
| := os.Open(fileName) if err != nil {
| return err } defer f.Close()
| doSomething() return nil }(); err
| != nil { return err } }
|
| That said, I don't like the ergonomics and if I'm doing a lot
| of file things, I'll write a `func withFile(fileName string,
| callback func(*os.File) error) error` function which often
| composes more nicely.
| avgcorrection wrote:
| Yet again the heavy initialism that looks like it came right
| out of a C++ standards document is just a long name for a
| simple thing.
| arp242 wrote:
| I use this for package globals especially: var
| pkgGlobal = func() string { ... }()
|
| Much better than: var pkgGlobal string
| func init() { pkgGlobal = ... }
| shadowgovt wrote:
| Conceptually yes, but IIUC there should be worlds of difference
| at the compilation output level (i.e. unlike calling a
| function, the compiler's not obligated to set up or tear down a
| context every time it enters or exits a block; it can just
| treat the static scope semantics without having any impact on
| the runtime semantics).
|
| ETA: except for go's `defer`, and off the top of my head I
| don't actually know if Go is obliged to run the defer
| immediately upon exiting the block or can choose to run it at
| some other point in the function.
| Jtsummers wrote:
| In Go, `defer` runs at the end of the enclosing function, not
| the enclosing block.
| vlovich123 wrote:
| That shouldn't be the case for c++ and Rust lambdas that are
| immediately invoked. The compiler should see through it.
| slaymaker1907 wrote:
| Not necessarily for C++. At least for msbuild, you can use
| `__declspec(noinline)` on a lambda. This can be handy for
| complex macros that would otherwise allocate a bunch of
| memory.
| ollien wrote:
| > unlike calling a function, the compiler's not obligated to
| set up or tear down a context
|
| I guess it depends on what you mean by "context" but the spec
| is very clear that a block creates scope, and the end removes
| scope.
|
| https://go.dev/ref/spec#Declarations_and_scope
|
| > The scope of a constant or variable identifier declared
| inside a function begins at the end of the ConstSpec or
| VarSpec (ShortVarDecl for short variable declarations) and
| ends at the end of the innermost containing block.
| slaymaker1907 wrote:
| One caution on use of blocks is that at least for C++, you can
| end up having a large amount of stack usage. While destructors
| follow strict lexical scoping, stack allocations are only
| guaranteed to be released at the end of a (non-inline) function.
| The compiler can reuse memory allocated on the stack for multiple
| variables assuming the lifetimes don't overlap, but this isn't
| guaranteed. For example of when reuse doesn't help:
| void someProc() { char buffer1[256];
| { char buffer2[256]; // string
| ops } // we still have 512 bytes on
| the stack whereas it would only be // 256 if we used
| a non-inline function instead of a block
| someDeepCall(); }
|
| I'm not sure what Rust or Go do in cases like this.
| zmj wrote:
| C# has the same issue for stack-allocated spans. Doing that in
| a loop is often a stack overflow.
| [deleted]
| pcwalton wrote:
| Rust does the same thing as C++ here. Note that rustc is pretty
| aggressive about using the LLVM lifetime intrinsics to allow
| for stack coloring (i.e. reuse of stack slots as necessary).
| titaniczero wrote:
| I have thought about this but sometimes I reuse the variables on
| purpose to reuse the memory allocated, especially in a language
| like Golang where sometimes it is not that clear whether it will
| be allocated in the stack or the heap. I guess we should avoid
| doing this in hot paths, right?
| catfishx wrote:
| In rust you can use these almost everywhere in place of an
| expression (if statements, while loops, in function arguments,
| etc). It can sometimes make code a bit unreadable to some people,
| but its still a cool feature imo.
| masklinn wrote:
| Conveniently in Rust because a block is an expression it lets
| you "convert" a statement (or sequence thereof) into an
| expression.
|
| To be used sparingly, but very useful when it applies e.g. in
| the precise capture clause pattern, or for non-trivial object
| initialisation (as Rust doesn't have many literals or literal-
| ish macros).
| puffoflogic wrote:
| Here's one notable use that I doubt many are familiar with:
| let iter = some_iterator_expression; {iter}.nth(2)
|
| For some reason, nth takes &mut self instead of self (note
| that T: Iterator => &mut T: Iterator<Item=T::Item> so there
| was no need for this in the API design to support having
| access to the iterator after exhaustion; it was a mistake).
| So if you tried to use it like iter.nth(2) with the non-mut
| binding, that would fail. But {iter} forces iter to be moved
| - unlike (iter) which would not. Then iter becomes a
| temporary and we're free to call a &mut self method on it.
|
| In general: {expression} turns a place expression into a
| value expression, i.e. forcing the place to be moved from.
| owaislone wrote:
| I love this feature. I use it most of the time in large functions
| where I can group some lines as one logical piece but not generic
| enough to be a function. It helps with reading the code back and
| prevents leaking vars into rest of the function scope. Such
| blocks end up becoming scattered single use functions when
| extracted out which isn't ideal.
| merb wrote:
| I'm not sure if this is the best advise for rust, since you will
| play with variable lifetimes than.
| kibwen wrote:
| In the context of Rust, using a block can only strictly reduce
| the lifetime of a local, so if that would lead to a problem it
| would just not compile. :P
| saghm wrote:
| This used to be standard practice in Rust before NLL (non-
| lexical lifetimes) were implemented. As a trivial example,
| here's some code that works on Rust today: https://play.rust-
| lang.org/?version=stable&mode=debug&editio...
|
| However, try to compile this with Rust 1.0 (which you can get
| by running `rustup update 1.0.0` and then using `cargo +1.0.0
| run` or `rustc +1.0.0`) and you'll get an error saying that you
| can't push another element onto the vector due to it already
| being borrowed by the slice. This is because the borrow checker
| previously assumed that any borrow would remain in use for the
| remainder of the scope. The "fix" to this was to manually put
| in a block to "end" the borrow early. However, a lot of work
| was done to allow the borrow checker to be more sophisticated,
| and in Rust 1.31 (near the end of 2018: https://blog.rust-
| lang.org/2018/12/06/Rust-1.31-and-rust-201...), the work
| allowing the compiler to recognize that a borrow was no longer
| used before the end of a scope and therefore would allow
| subsequent borrows that it previously would have considered
| conflicting.
|
| All that being said, there's still a useful feature of blocks
| in Rust that I don't see mentioned in this blog post: blocks in
| Rust are actually expressions! By default, the last value in a
| Rust block will be yielded, but you can also use the `break`
| keyword earlier (similar to how the return value of a Rust
| function will be the last value, but you can also explicitly
| `return` earlier). As an added bonus, this also works for
| `loop` blocks; since they will only ever terminate if `break`
| is explicitly invoked (unlike `while` or `for`), whatever value
| is specified will be yielded from the loop.
| Anaminus wrote:
| This is a thing in Lua as well, with do-end blocks:
| local foo do local bar = 42 --
| Same as `foo = function() ... end`, so this sets the local foo
| variable. function foo() return bar
| end end
|
| Because Lua has no significant whitespace, it can be made to look
| like some kind of specific syntax: local foo do
| local bar = 42 function foo() return
| bar end end
|
| Though I think this is too clever, so I like to insert a semi-
| colon to make it clear what is happening: local
| foo; do local bar = 42 function foo()
| return bar end end
| maest wrote:
| Other people in the comments have mentioned you can use functions
| to achieve some of the benefits of the {blocks}.
|
| I wanted to point out how q/kdb handles this, because I think
| it's quite nice.
|
| In kdb, blocks are how you define functions.
| {x+1};
|
| That is a function which takes your argument, adds one to it and
| returns it. (x is the default name for the function argument,
| another lovely piece of design).
|
| If you want to have a named function, just assign it to a
| variable: my_inc: {x+1};
|
| Now you can call my_inc(1) and get back 2.
|
| Light, consistent and reusable language design, very nice. No
| need to have two ways of defining functions (e.g. the needless
| separation of def and lambdas in Python).
|
| The upshot of this is that you get these block constructs for
| free: myvar_a: 1; myvar_b: 2;
| my_top_level_var: { //less important work }[];
|
| (The one downside is that you need to call the function with the
| [] brackets)
| surprisetalk wrote:
| Wow, super cool design decision!
|
| Can you give multiple named variables? How would you write a
| function like this? const f = (a,b,c) => {
| return a * b + c }
|
| Also, is this just q/kdb, or does this also apply to K?
| leprechaun1066 wrote:
| K4 is the implementation language of q, kdb is the database
| part of the language. Most q is just named utility functions
| written in K4. There isn't really much difference in what the
| machine does with them under the covers, but when stuck on a
| problem talking about q is more likely to get help than k.
|
| You can provide up to 8 named inputs in a function
| definition.
|
| q example for running sum of 8 inputs:
| q)f:{[a;b;c;d;e;f;g;h]sums a,b,c,d,e,f,g,h}
|
| same in K4: q)\
| f:{[a;b;c;d;e;f;g;h]+\a,b,c,d,e,f,g,h}
| maest wrote:
| The default variable names go all the way up to z. So you I
| would write your function as: f:{z+y*x}
|
| (k is evaluated strictly right to left - another design
| decision that I quite like -, so I had to move the variables
| around in the expression)
|
| If you wish, you can provide your own variable names as
| follows: f:{[a;b;c]c+b*a}
|
| This works in k and q (q is largely k with some nice-to-have
| functions defined on top).
| bxparks wrote:
| I find this useful in unit tests where I often find myself
| duplicating blocks of assertions multiple times. The {block}
| helps to prevent accidentally reusing variables which are defined
| by previous blocks of assertions.
|
| I rarely use a free-standing {block} in actual code. I think it's
| because if something is worthy enough to be logically grouped
| into a {block}, then it is probably worthwhile to pull it out
| into its own function, method or lambda expression.
| ithkuil wrote:
| I use blocks in rust in normal code as a way to control how
| long a lock is held. Not sure if that's the best practice
| tialaramex wrote:
| That makes sense, you can instead specifically Drop the lock
| guard but the block scope ending will do that.
|
| If you aren't already it's perhaps better to identify an
| object that's being locked, which you can have the Mutex
| wrap. So e.g. you could have a Mutex<Goose>, and then
| functions, even methods can take a reference to a Goose to
| ensure you can't call them by mistake without locking the
| Mutex - as you otherwise don't have a Goose. If the Goose
| doesn't need any actual data this will be free at runtime,
| the compiler type check ensures you took the lock as needed
| but since it's a Zero Size Type no machine code is emitted to
| deal with a Goose variable.
|
| Probably your application has a better name for what is being
| protected than Goose, that's just an example, but having some
| object that is locked can help ensure you get the locking
| right and that your mental model of what is being "locked" is
| coherent.
|
| Of course sometimes there really is no specific thing being
| locked, even an imaginary one, it's just lock #4 or whatever
| but in my experience that's rare.
___________________________________________________________________
(page generated 2023-01-25 23:01 UTC)