hngopher.com

       [HN Gopher] Using {blocks} in Rust and Go for fun and profit
       ___________________________________________________________________
        
       Using {blocks} in Rust and Go for fun and profit
        
       Author : surprisetalk
       Score  : 106 points
       Date   : 2023-01-25 13:24 UTC (9 hours ago)
        
 (HTM) web link (taylor.town)
 (TXT) w3m dump (taylor.town)
        
       | infogulch wrote:
       | This is one of my favorite features of rust, especially since you
       | can return a value from a block since the language is 'expression
       | oriented'.
        
         | mirekrusin wrote:
         | Yes, not having everything-is-expression in js hurts.
        
           | indeyets wrote:
           | There's a dormant proposal about this:
           | https://github.com/tc39/proposal-do-expressions
        
       | allo37 wrote:
       | I do this quite a bit in C and C++; I find it's a great way to
       | reduce the mental effort required to understand a long function
       | without having to jump around the source like when it is broken
       | up into multiple functions.
       | 
       | In C++ you can really (ab)use it to do things like scoped mutex
       | locks and "stopwatches" that start a timer on construction and
       | print the elapsed time on destruction.
       | 
       | Some people find it a bit bizarre though, to each his/her own I
       | guess.
        
         | dkarl wrote:
         | If your C++ codebase uses RAII to manage locks and other
         | expensive resources, I think it's fine to use blocks to define
         | their scope. The alternative is to encapsulate the scope in a
         | new function or method, which is great if it improves
         | readability, but not if it reduces readability.
        
       | DC-3 wrote:
       | One of the most common usages I find is to restrict the scope of
       | RAII locks such as Mutexes.
        
       | teeray wrote:
       | I've tried doing this a few times in a legacy codebase to contain
       | the definition of `err` when it changes type (this team did not
       | believe in the error interface for awhile).
       | 
       | The problem is that it doesn't stand out visually, and it's
       | uncommon, so less-experienced team members would have difficulty
       | comprehending what's going on. In the end we just opted to use
       | two different variables for the two types of error.
        
       | ajb wrote:
       | This feature is also there in C and C++ and probably other
       | languages. The catch is that js seems to have inherited the
       | syntax but not the scoping rules.
        
         | papercrane wrote:
         | It's in Java and D as well, which likely inherited it from C.
         | I'm pretty sure C inherited it from ALGOL's "BEGIN ... END"
         | blocks.
         | 
         | The catch with JS is that variable scoping rules are different
         | when you use 'var', if you use 'let' and 'const' the scoping
         | rules work like most programmers would expect for block
         | statements.
        
         | pdpi wrote:
         | `var` has weird scoping rules, but you can get the "right"
         | behaviour in modern javascript, using `let`.
        
         | a_humean wrote:
         | Has that not been fixed since ES2015?                 const a =
         | 5;       console.log(a)       {         const a = 10;
         | console.log(a)       }       console.log(a)            5
         | 10        5
         | 
         | What we are missing from say Rust in js is the blocks being
         | expressions, though there is a proposal ("do expressions") to
         | allow this:                 const a = do {         if (b) { 5 }
         | else { 10 }       };
         | 
         | https://github.com/tc39/proposal-do-expressions
        
           | throwaway90650 wrote:
           | Try it with var
        
             | arp242 wrote:
             | s/var/let/g and you're good. This is the entire reason
             | "let" was added in the first place, to correct this
             | historical mistake. It's supported pretty much everywhere
             | even by extremely conservative standards, and it's even the
             | same number of letters. Literally no reason not to use it.
             | 
             | I personally dislike JavaScript. Even modern JavaScript.
             | But this is a bullshit reason to hate on JavaScript.
        
             | a_humean wrote:
             | Fortunately ES2015 was 7+ years ago, so no I won't. :)
        
               | throwaway894345 wrote:
               | Every few years I'll jump into JavaScript for one thing
               | or another, and recently I jumped back into TypeScript
               | and holy cow it's shaping up to be a really nice
               | language. I really like `const` as well as TypeScript's
               | concept of type widening/narrowing (`"foo"` is its own
               | type as a specific string but can be widened to
               | `string`), which allows the compiler to know that
               | `document.createElement("table")` returns an
               | `HTMLTableElement` rather than `HTMLElement`--this could
               | have been avoided by having a
               | `document.createTableElement()` method with its own
               | signature, but given that it has to work with older,
               | dynamically typed APIs, this is a pretty elegant
               | solution.
               | 
               | Similarly, if I have a discriminant union `type FooBar =
               | "foo" | "bar";`, TypeScript seems to know that `if
               | (["foo", "bar"].includes(x)) {...}` exhaustively handles
               | all permutations of `FooBar` (no need for a `switch`
               | statement with explicit arms).
               | 
               | The static typing really helps me avoid a bunch of
               | "undefined is not a function" stuff that I would waste
               | time with in JavaScript.
               | 
               | Pretty cool stuff!
        
               | shadowgovt wrote:
               | TypeScript's type language is extraordinarily powerful.
               | It's completely reframed the way I do web development; I
               | tend to do much more functional and less method-based
               | semantics these days because interfaces and generic types
               | make that feasible without going mad from losing track of
               | what functions can be applied to what data (and receiving
               | no help from the very lax type semantics and runtime of
               | regular JavaScript).
        
       | kelnos wrote:
       | Well, yeah, you can do this in pretty much any language that has
       | delineated scopes like that.
       | 
       | Of course, blocks-are-expressions is necessary if you want to
       | return the result of your computations from the block and store
       | it in a variable outside the block. (You can of course declare
       | the storage variable outside the block and assign to it inside,
       | but that's less nice.)
       | 
       | GNU C even has an extension called "statement expressions" where
       | blocks _can_ return values; the syntax looks like this:
       | int foo = ({             int bar = 4 * 3;             bar;
       | });
       | 
       | Clang implements it in addition to GCC, as do a few other
       | compilers. (Notably, IIRC, MSVC does not.)
        
       | rcme wrote:
       | Some of the error handling examples in Go are unnecessary as Go
       | allows you to access variable defined in your if-statement in
       | other branches. For instance:                   if result, err :=
       | something(); err == nil {             if result.RowsAffected() ==
       | 0 {                 return nil             }         } else if
       | err != nil {             return err         }
        
         | xigoi wrote:
         | What's the point of the "if err != nil" after the "else"?
        
       | cmontella wrote:
       | Best use I've found for this is to make sure borrowed a Rc is
       | dropped when you want it to be. Otherwise I've run into "this
       | thing is already borrowed" errors when the runtime decided the
       | thing should still be borrowed by something else.
        
       | xiphias2 wrote:
       | Functions do the same thing, but are more reaadable due to
       | explicit naming.
       | 
       | It can be great though as an intermediate step to extracting
       | functions.
        
         | wongarsu wrote:
         | Functions fulfill the same function, but they move the code to
         | a different place. Sometimes that's desirable for readability,
         | sometimes it's more readable to keep things inline. It's nice
         | to have both options
        
         | a_humean wrote:
         | These blocks have other advantages such as having access to the
         | outer scope. Yes those could be passed in as arugments to a
         | function, but I think there are many cases where that extra
         | overhead feels unreasonable given you have already gathered all
         | of the values here for this purpose anyway.
        
           | epidemian wrote:
           | Yep. And they also keep single-use code local to the only
           | place it's being used at the moment, which i think can help
           | the readability and maintainability of said code :)
           | 
           | Functions do have a place too of course; even single-use
           | ones. Especially when you can give them a clear purpose and
           | name.
        
             | Pulcinella wrote:
             | A sort of reverse or corollary to DRY, "don't be
             | unintentionally repeatable."
        
               | Jtsummers wrote:
               | It's more of an extension to "Don't use globals", but at
               | a more fine-grained level. Code using this can still be
               | DRY, if you understand DRY to be "don't repeat the same
               | information excessively". Reusing the same variable name
               | in multiple (similar or dissimilar) contexts is not a DRY
               | violation in that sense if the names are more a
               | coincidence than actual shared information.
               | 
               | Most programmers have no problem, for instance, with
               | seeing multiple loops declared like:
               | for(int i = ...; ...; ...) { ... }
               | 
               | `i` is "repeated", but there's no objection in having
               | multiple declarations since each has a meaning dependent
               | on its local context.
        
         | arp242 wrote:
         | I like this for some things where the logic is somewhat large-
         | ish, but also strongly coupled. A good example is setting up
         | state for integration tests; I'm not going to re-use any
         | functions I create for that. It's not _bad_ to use functions, I
         | just find it more convenient to keep it all in one function,
         | but split out a little bit in these faux-subfunctions.
         | 
         | I don't use it often, but when I do, I find it convenient.
        
           | jerf wrote:
           | "A good example is setting up state for integration tests;
           | I'm not going to re-use any functions I create for that."
           | 
           | And if it turns out I'm wrong about that, the braces provide
           | a very nice, guaranteed cutting point in the future that
           | anyone can understand without having to load all the context
           | of the function.
           | 
           | This isn't the sort of thing I want splattered all over a
           | code base, but it is a nice niche tool.
        
       | jotaen wrote:
       | A somewhat related technique which I often find useful is
       | something that's known as "immediately invoked function
       | expressions" (IIFE) in JavaScript. That also creates a sub-scope
       | in place, but it let's you return values to the enclosing scope.
       | E.g.:                  result := func() string {
       | helperVar1 := //...            helperVar2 := //...
       | return helperVar1 + helperVar2        }()
       | 
       | So it's basically an anonymous function that is invoked right
       | away. You could achieve the same scope separation by pulling it
       | out as named function, but sometimes I like it better to keep
       | things closer together.
        
         | throwaway894345 wrote:
         | This is really helpful in Go for `defer`. For example, if I'm
         | manipulating files in a loop, I don't want to do:
         | for _, fileName := range fileNames {             f, err :=
         | os.Open(fileName)             if err != nil {
         | return err             }             defer f.Close()
         | doSomething(f)         }
         | 
         | ... because I might run out of quota for open file handles. I
         | want the defer to trigger at the end of the loop rather than at
         | the end of the function, so I'll often put a closure in the
         | loop body:                   for _, fileName := range fileNames
         | {             if err := func() error {                 f, err
         | := os.Open(fileName)                 if err != nil {
         | return err                 }                 defer f.Close()
         | doSomething()                 return nil             }(); err
         | != nil {                 return err             }         }
         | 
         | That said, I don't like the ergonomics and if I'm doing a lot
         | of file things, I'll write a `func withFile(fileName string,
         | callback func(*os.File) error) error` function which often
         | composes more nicely.
        
         | avgcorrection wrote:
         | Yet again the heavy initialism that looks like it came right
         | out of a C++ standards document is just a long name for a
         | simple thing.
        
         | arp242 wrote:
         | I use this for package globals especially:                  var
         | pkgGlobal = func() string {           ...         }()
         | 
         | Much better than:                  var pkgGlobal string
         | func init() {           pkgGlobal = ...        }
        
         | shadowgovt wrote:
         | Conceptually yes, but IIUC there should be worlds of difference
         | at the compilation output level (i.e. unlike calling a
         | function, the compiler's not obligated to set up or tear down a
         | context every time it enters or exits a block; it can just
         | treat the static scope semantics without having any impact on
         | the runtime semantics).
         | 
         | ETA: except for go's `defer`, and off the top of my head I
         | don't actually know if Go is obliged to run the defer
         | immediately upon exiting the block or can choose to run it at
         | some other point in the function.
        
           | Jtsummers wrote:
           | In Go, `defer` runs at the end of the enclosing function, not
           | the enclosing block.
        
           | vlovich123 wrote:
           | That shouldn't be the case for c++ and Rust lambdas that are
           | immediately invoked. The compiler should see through it.
        
             | slaymaker1907 wrote:
             | Not necessarily for C++. At least for msbuild, you can use
             | `__declspec(noinline)` on a lambda. This can be handy for
             | complex macros that would otherwise allocate a bunch of
             | memory.
        
           | ollien wrote:
           | > unlike calling a function, the compiler's not obligated to
           | set up or tear down a context
           | 
           | I guess it depends on what you mean by "context" but the spec
           | is very clear that a block creates scope, and the end removes
           | scope.
           | 
           | https://go.dev/ref/spec#Declarations_and_scope
           | 
           | > The scope of a constant or variable identifier declared
           | inside a function begins at the end of the ConstSpec or
           | VarSpec (ShortVarDecl for short variable declarations) and
           | ends at the end of the innermost containing block.
        
       | slaymaker1907 wrote:
       | One caution on use of blocks is that at least for C++, you can
       | end up having a large amount of stack usage. While destructors
       | follow strict lexical scoping, stack allocations are only
       | guaranteed to be released at the end of a (non-inline) function.
       | The compiler can reuse memory allocated on the stack for multiple
       | variables assuming the lifetimes don't overlap, but this isn't
       | guaranteed. For example of when reuse doesn't help:
       | void someProc() {             char buffer1[256];
       | {                 char buffer2[256];                 // string
       | ops             }                  // we still have 512 bytes on
       | the stack whereas it would only be             // 256 if we used
       | a non-inline function instead of a block
       | someDeepCall();         }
       | 
       | I'm not sure what Rust or Go do in cases like this.
        
         | zmj wrote:
         | C# has the same issue for stack-allocated spans. Doing that in
         | a loop is often a stack overflow.
        
         | [deleted]
        
         | pcwalton wrote:
         | Rust does the same thing as C++ here. Note that rustc is pretty
         | aggressive about using the LLVM lifetime intrinsics to allow
         | for stack coloring (i.e. reuse of stack slots as necessary).
        
       | titaniczero wrote:
       | I have thought about this but sometimes I reuse the variables on
       | purpose to reuse the memory allocated, especially in a language
       | like Golang where sometimes it is not that clear whether it will
       | be allocated in the stack or the heap. I guess we should avoid
       | doing this in hot paths, right?
        
       | catfishx wrote:
       | In rust you can use these almost everywhere in place of an
       | expression (if statements, while loops, in function arguments,
       | etc). It can sometimes make code a bit unreadable to some people,
       | but its still a cool feature imo.
        
         | masklinn wrote:
         | Conveniently in Rust because a block is an expression it lets
         | you "convert" a statement (or sequence thereof) into an
         | expression.
         | 
         | To be used sparingly, but very useful when it applies e.g. in
         | the precise capture clause pattern, or for non-trivial object
         | initialisation (as Rust doesn't have many literals or literal-
         | ish macros).
        
           | puffoflogic wrote:
           | Here's one notable use that I doubt many are familiar with:
           | let iter = some_iterator_expression;         {iter}.nth(2)
           | 
           | For some reason, nth takes &mut self instead of self (note
           | that T: Iterator => &mut T: Iterator<Item=T::Item> so there
           | was no need for this in the API design to support having
           | access to the iterator after exhaustion; it was a mistake).
           | So if you tried to use it like iter.nth(2) with the non-mut
           | binding, that would fail. But {iter} forces iter to be moved
           | - unlike (iter) which would not. Then iter becomes a
           | temporary and we're free to call a &mut self method on it.
           | 
           | In general: {expression} turns a place expression into a
           | value expression, i.e. forcing the place to be moved from.
        
       | owaislone wrote:
       | I love this feature. I use it most of the time in large functions
       | where I can group some lines as one logical piece but not generic
       | enough to be a function. It helps with reading the code back and
       | prevents leaking vars into rest of the function scope. Such
       | blocks end up becoming scattered single use functions when
       | extracted out which isn't ideal.
        
       | merb wrote:
       | I'm not sure if this is the best advise for rust, since you will
       | play with variable lifetimes than.
        
         | kibwen wrote:
         | In the context of Rust, using a block can only strictly reduce
         | the lifetime of a local, so if that would lead to a problem it
         | would just not compile. :P
        
         | saghm wrote:
         | This used to be standard practice in Rust before NLL (non-
         | lexical lifetimes) were implemented. As a trivial example,
         | here's some code that works on Rust today: https://play.rust-
         | lang.org/?version=stable&mode=debug&editio...
         | 
         | However, try to compile this with Rust 1.0 (which you can get
         | by running `rustup update 1.0.0` and then using `cargo +1.0.0
         | run` or `rustc +1.0.0`) and you'll get an error saying that you
         | can't push another element onto the vector due to it already
         | being borrowed by the slice. This is because the borrow checker
         | previously assumed that any borrow would remain in use for the
         | remainder of the scope. The "fix" to this was to manually put
         | in a block to "end" the borrow early. However, a lot of work
         | was done to allow the borrow checker to be more sophisticated,
         | and in Rust 1.31 (near the end of 2018: https://blog.rust-
         | lang.org/2018/12/06/Rust-1.31-and-rust-201...), the work
         | allowing the compiler to recognize that a borrow was no longer
         | used before the end of a scope and therefore would allow
         | subsequent borrows that it previously would have considered
         | conflicting.
         | 
         | All that being said, there's still a useful feature of blocks
         | in Rust that I don't see mentioned in this blog post: blocks in
         | Rust are actually expressions! By default, the last value in a
         | Rust block will be yielded, but you can also use the `break`
         | keyword earlier (similar to how the return value of a Rust
         | function will be the last value, but you can also explicitly
         | `return` earlier). As an added bonus, this also works for
         | `loop` blocks; since they will only ever terminate if `break`
         | is explicitly invoked (unlike `while` or `for`), whatever value
         | is specified will be yielded from the loop.
        
       | Anaminus wrote:
       | This is a thing in Lua as well, with do-end blocks:
       | local foo         do             local bar = 42             --
       | Same as `foo = function() ... end`, so this sets the local foo
       | variable.             function foo()                 return bar
       | end         end
       | 
       | Because Lua has no significant whitespace, it can be made to look
       | like some kind of specific syntax:                   local foo do
       | local bar = 42             function foo()                 return
       | bar             end         end
       | 
       | Though I think this is too clever, so I like to insert a semi-
       | colon to make it clear what is happening:                   local
       | foo; do             local bar = 42             function foo()
       | return bar             end         end
        
       | maest wrote:
       | Other people in the comments have mentioned you can use functions
       | to achieve some of the benefits of the {blocks}.
       | 
       | I wanted to point out how q/kdb handles this, because I think
       | it's quite nice.
       | 
       | In kdb, blocks are how you define functions.
       | {x+1};
       | 
       | That is a function which takes your argument, adds one to it and
       | returns it. (x is the default name for the function argument,
       | another lovely piece of design).
       | 
       | If you want to have a named function, just assign it to a
       | variable:                   my_inc: {x+1};
       | 
       | Now you can call my_inc(1) and get back 2.
       | 
       | Light, consistent and reusable language design, very nice. No
       | need to have two ways of defining functions (e.g. the needless
       | separation of def and lambdas in Python).
       | 
       | The upshot of this is that you get these block constructs for
       | free:                   myvar_a: 1;         myvar_b: 2;
       | my_top_level_var: {           //less important work         }[];
       | 
       | (The one downside is that you need to call the function with the
       | [] brackets)
        
         | surprisetalk wrote:
         | Wow, super cool design decision!
         | 
         | Can you give multiple named variables? How would you write a
         | function like this?                   const f = (a,b,c) => {
         | return a * b + c }
         | 
         | Also, is this just q/kdb, or does this also apply to K?
        
           | leprechaun1066 wrote:
           | K4 is the implementation language of q, kdb is the database
           | part of the language. Most q is just named utility functions
           | written in K4. There isn't really much difference in what the
           | machine does with them under the covers, but when stuck on a
           | problem talking about q is more likely to get help than k.
           | 
           | You can provide up to 8 named inputs in a function
           | definition.
           | 
           | q example for running sum of 8 inputs:
           | q)f:{[a;b;c;d;e;f;g;h]sums a,b,c,d,e,f,g,h}
           | 
           | same in K4:                 q)\
           | f:{[a;b;c;d;e;f;g;h]+\a,b,c,d,e,f,g,h}
        
           | maest wrote:
           | The default variable names go all the way up to z. So you I
           | would write your function as:                   f:{z+y*x}
           | 
           | (k is evaluated strictly right to left - another design
           | decision that I quite like -, so I had to move the variables
           | around in the expression)
           | 
           | If you wish, you can provide your own variable names as
           | follows:                   f:{[a;b;c]c+b*a}
           | 
           | This works in k and q (q is largely k with some nice-to-have
           | functions defined on top).
        
       | bxparks wrote:
       | I find this useful in unit tests where I often find myself
       | duplicating blocks of assertions multiple times. The {block}
       | helps to prevent accidentally reusing variables which are defined
       | by previous blocks of assertions.
       | 
       | I rarely use a free-standing {block} in actual code. I think it's
       | because if something is worthy enough to be logically grouped
       | into a {block}, then it is probably worthwhile to pull it out
       | into its own function, method or lambda expression.
        
         | ithkuil wrote:
         | I use blocks in rust in normal code as a way to control how
         | long a lock is held. Not sure if that's the best practice
        
           | tialaramex wrote:
           | That makes sense, you can instead specifically Drop the lock
           | guard but the block scope ending will do that.
           | 
           | If you aren't already it's perhaps better to identify an
           | object that's being locked, which you can have the Mutex
           | wrap. So e.g. you could have a Mutex<Goose>, and then
           | functions, even methods can take a reference to a Goose to
           | ensure you can't call them by mistake without locking the
           | Mutex - as you otherwise don't have a Goose. If the Goose
           | doesn't need any actual data this will be free at runtime,
           | the compiler type check ensures you took the lock as needed
           | but since it's a Zero Size Type no machine code is emitted to
           | deal with a Goose variable.
           | 
           | Probably your application has a better name for what is being
           | protected than Goose, that's just an example, but having some
           | object that is locked can help ensure you get the locking
           | right and that your mental model of what is being "locked" is
           | coherent.
           | 
           | Of course sometimes there really is no specific thing being
           | locked, even an imaginary one, it's just lock #4 or whatever
           | but in my experience that's rare.
        
       ___________________________________________________________________
       (page generated 2023-01-25 23:01 UTC)