hngopher.com

       [HN Gopher] The Norway Problem
       ___________________________________________________________________
        
       The Norway Problem
        
       Author : dedalus
       Score  : 576 points
       Date   : 2021-04-02 13:12 UTC (1 days ago)
        
 (HTM) web link (hitchdev.com)
 (TXT) w3m dump (hitchdev.com)
        
       | gitowiec wrote:
       | I don't like YAML because when I need to write configuration in
       | it I waste time to remember what is the syntax. I have much
       | better understanding of JSON, because I use it almost on daily
       | basis.
        
       | vanshg wrote:
       | Why not just use Python itself for storing configurations? You
       | can be explicit about the data type and no need to parse anything
        
       | DennisP wrote:
       | Something that used to plague me is that I had database processes
       | importing Excel docs from clients, and if the first few rows in a
       | column were numbers, SQLServer assumed that all the values must
       | be numbers. Then it would run into cells containing other
       | strings, and instead of revising its assumption, it would just
       | import them as null. Since clients often didn't have great data
       | hygiene, it was a problem.
       | 
       | I finally solved it by exporting to csv, and using third-party
       | software that handled its own import and did it correctly.
        
       | dpratt71 wrote:
       | I don't understand why Haskell gets brought up in the middle of
       | an otherwise interesting and useful article. This sort of thing
       | cannot happen in Haskell. And while Haskell is not universally
       | admired, I can't recall seeing Haskell's flavor of type inference
       | being a reason why someone claimed to dislike Haskell.
        
       | sdfhbdf wrote:
       | What I am most baffled by with Yaml is the fact that it's a
       | superset of JSON.
       | 
       | Whenever an input accepts YAML you can actually pass in JSON
       | there and it'll be valid
       | 
       | It really surprised me when I found out and I use JSON Whenever
       | possible since then since it's much stricter
       | 
       | https://en.m.wikipedia.org/wiki/JSON#YAML
        
         | dragonwriter wrote:
         | > Whenever an input accepts YAML you can actually pass in JSON
         | there and it'll be valid
         | 
         | Strictly speaking, this is only true of YAML 1.2, not YAML
         | 1.0-1.1 (the article here addresses YAML 1.1 behavior, the
         | headline example od which was removed ib YAML 1.2 twelve years
         | ago), though it calla YAML 1.1 "YAML 2.0", which doesn't
         | actually exists.
         | 
         | Of course, there are lots of features, like custom types, that
         | JSON doesn't support, but you can still use YAML's JSON-style
         | syntax instead of actual JSON, for them.
        
         | alephu5 wrote:
         | Yes this is usually the best way. If you need some features for
         | code reuse there are several preprocessors. I personally use
         | Dhall to configure everything and then convert it to JSON for
         | my application to consume. It is a lot more powerful than YAML
         | and has a very safety-oriented type system.
        
         | norrius wrote:
         | > Whenever an input accepts YAML you can actually pass in JSON
         | there and it'll be valid
         | 
         | ...unless your parser strictly implements YAML 1.1, in which
         | case you should be careful to add whitespace around commas (and
         | a few other minor things). This is a valid JSON that some YAML
         | parsers will have problems with:
         | {"foo":"bar","\/":10e1}
         | 
         | The very first result Google gives me for "yaml parser" is
         | https://yaml-online-parser.appspot.com, which breaks on the
         | backslash-forward slash sequence.
        
       | knorker wrote:
       | NO problem.
        
       | lifeisstillgood wrote:
       | Hang on. The strict model seems off.
       | 
       | In the first model entering
       | 
       | GB 9.3
       | 
       | gets you a string and a number.
       | 
       | But the second gets you two strings?
       | 
       |  _Both_ are wrong in my opinion.
       | 
       | "GB" 9.3
       | 
       | is the correct approach
       | 
       | Explicit beats implicit every time.
        
       | Waterluvian wrote:
       | I have never gotten far into a project and thought, "my config
       | files are too verbose. I wish there were clever shorthands."
       | 
       | Does Yaml have any sort of strict mode?
       | 
       | I imagine I could find a linter that disallows implicit strings.
        
         | exyi wrote:
         | Not YAML by itself, but there are libraries that parse a YAML-
         | like format that is typed. For example this one:
         | https://hitchdev.com/strictyaml/. Technically, it is not
         | compatible with the YAML spec.
        
       | yakshaving_jgt wrote:
       | > it's equally true that extremely strict type systems require a
       | lot more upfront and the law of diminishing returns applies to
       | type strictness - a cogent answer to the question "why is so
       | little software written in haskell?"
       | 
       | I was with the article up until that point. I don't agree that
       | diminishing returns with regards to type strictness applies
       | linearly. Term-level Haskell is not massively harder than writing
       | most equivalent code in JavaScript -- in fact I'd say it's easier
       | and you reap greater benefit. Perhaps it's a different story when
       | you go all-in on type-level programming, but I'm not sure that's
       | what the author was getting at. This smells of the _Middle
       | Ground_ logical fallacy to me. Or of course the comment was
       | tongue-in-cheek and I 'm overreacting.
        
         | choeger wrote:
         | That law of diminishing returns might actually apply, I am not
         | 100% sure. But more powerful type systems allow for the more
         | complex composition of more complex interfaces in a safe
         | manner. Think of higher-level modules and data structures. Or
         | dependent types and input handling. Or linear types and
         | resource handling.
        
         | samvher wrote:
         | I agree. I would say that Erlang goes ~80% of the way compared
         | to Haskell's type system and the last 20% really matter, to the
         | point that in many cases I find myself not really using
         | Erlang's (optional) type system at all. Better type coverage
         | and more descriptive types allow the compiler to infer more and
         | I'd say this is the opposite of diminishing returns.
        
         | 7952 wrote:
         | I had to rewrite some JavaScript code in Postgres recently that
         | measured the overlap between different elevation ranges. In JS
         | I had to write it myself and deal with the edge cases and bugs.
         | In Postgres I just use the range type and some operators. It
         | was brilliant in comparison. The tiny effort of learning it was
         | worth it. The list of data types I use all the time is bigger
         | than just string, numbers and booleans. Serialisation formats
         | should support them. Particularly as there are often text
         | format standards that already exist for a lot of them. Give me
         | wkt geometry and iso formatted dates. It's not that difficult
         | and totally with it.
        
       | kstenerud wrote:
       | The worst tragedy of this is the security implications of subtly
       | different parsers. As your application surface increases, you're
       | likely to mix languages (and thus different parsers), which means
       | that the same input data will produce different output data
       | depending on whether your parser replaces, truncates, ignores, or
       | otherwise attempts to automatically "fix up" the data. A
       | carefully crafted document could exploit this to trick your data
       | storage layer into storing truncated data that elevates
       | privileges or sets zero cost, while your access control layer
       | that ignores or replaces the data is perfectly happy to let the
       | bad document pass by.
       | 
       | And here's something else to keep you up at night: Just think of
       | how many unintentional land mines lurk in your serialized data,
       | waiting to blow up spectacularly (or even worse, silently) as
       | soon as you attempt to change implementation technologies!
       | 
       | This is why I've been so anal about consistent decoder behavior
       | in Concise Encoding https://github.com/kstenerud/concise-
       | encoding/blob/master/ce...
       | 
       | https://concise-encoding.org/
        
       | yellowapple wrote:
       | This is exactly why configuration/serialization formats should
       | make as few assumptions about value types as possible. Once
       | parsing's done, everything should be a string (or possibly a
       | symbol/atom, if the program ingesting such a file supports
       | those), and it should be up to the application to convert values
       | to the types it expects. This is Tcl's approach, and it's about
       | as sensible as it gets.
       | 
       | ...which is why it pains me to admit that in my own project for a
       | Tcl-like scripting/config language[1] I missed the float v.
       | string issue, so it'll currently "cleverly" return different
       | types for 1.2 (float) v. 1.2.3 (atom). Coincidentally, I started
       | work on a "stringy" alternative interpreter that hews closer to
       | Tcl's philosophy (to fix a separate issue - namely, to avoid
       | dynamically generating atoms, and therefore avoid crashing the
       | Erlang VM when given potentially-adversarial input), so I'm gonna
       | fix that case for at least the "stringy" mode (by emitting
       | strings instead of numbers, too), knocking out two birds with one
       | stone for the upcoming 0.3.0 release :)
       | 
       | ----
       | 
       | [1]: https://otpcl.github.io, for those curious
        
         | progval wrote:
         | > Once parsing's done, everything should be a string
         | 
         | Or give a schema to the parser, defining what type is expected
         | in each field.
        
           | kenshoen wrote:
           | Yes, that looks like a right way to handle this problem
           | without ignoring YAML spec. Define what to parse upfront.
        
         | dkersten wrote:
         | It's reasons like this that I want my configuration languages
         | to be explicit and unambiguous. This is why I use JSON or if I
         | want a human friendly format, TOML. Strings are always "quoted"
         | and numbers are always unquoted 1.2, it can never accidentally
         | parse one as the other. The convenience of omitting quotes is
         | just not worth the potential for ambiguity or edge cases to me.
        
           | [deleted]
        
       | eitland wrote:
       | There exists a couple of mainstream languages that are full of
       | these sorts of _interesting_ behavior, one of them is supposedly
       | cool and productive and the other is supposedly ugly and evil.
        
         | brohee wrote:
         | The "Wat?" Talk got quite a few example and is hilarious.
         | 
         | https://www.destroyallsoftware.com/talks/wat
        
         | drno123 wrote:
         | Python vs JavaScript?
        
           | speedgoose wrote:
           | Python vs PHP also.
        
             | pansa2 wrote:
             | > _full of these sorts of interesting behavior_
             | 
             | I don't think that applies to Python - it's quite strongly
             | (although not statically) typed. I agree that it does apply
             | to JavaScript and PHP.
        
               | eitland wrote:
               | Javascript and PHP is correct.
        
               | exyi wrote:
               | I think this applies to Python pretty well. Although
               | certainly not as bad as PHP, most JS traps also exist in
               | Python (falsy values, optional glitchy semicolons,
               | function scoped variables, mutable closure). There is
               | many JS specific traps like this and also other Python
               | specific ones (like static fields are also instance
               | fields, Python versions and library dependency hell).
               | However I find it easier to avoid them in JS than in
               | Python with TypeScript, avoiding classes, ...
        
         | lunfard00 wrote:
         | and yet I don't see anyone complain about bash which is
         | arguably far worse than those 2. When things get hard on bash,
         | you will start to see python scripts on CI and whole thing is
         | complete unreadable mess
        
           | masklinn wrote:
           | > I don't see anyone complain about bash
           | 
           | You're not looking really hard then, but really
           | 
           | > When things get hard on bash, you will start to see python
           | scripts
           | 
           | That's kinda the thing innit? Unless the system specifically
           | only allows shell scripts (something I don't think I've ever
           | encountered though I'm sure it exists) it's quite easy to
           | just use something else when bash sucks, so while people will
           | absolutely complain about it they also have an escape: don't
           | use bash.
           | 
           | When a piece of software uses YAML for its configuration
           | though, you don't really have such an option.
           | 
           | Furthermore, bash being a relatively old technology people
           | know to avoid it, or what the most common pitfalls are.
           | Though they'll still fall into these pitfalls regularly.
        
             | lunfard00 wrote:
             | There is a lot of elitism around bash, like the "Arch btw"
             | thing but far worse because a lot of important things
             | depends on it.
             | 
             | Powershell has been working on linux for quite a while now
             | and doesnt seem get any attention even when it has a nice
             | IDE support and copy the good things about bash.
        
               | lokedhs wrote:
               | It doesn't copy all the good things about the Unix shell
               | though.
               | 
               | The reason people are comfortable with the POSIX shell is
               | because you use the same syntax for typing commands
               | manually as you do for scripts. But, you're going to have
               | a hard time finding people who prefers writing:
               | Remove-Item some/directory -recursive
               | 
               | Rather than                   rm -fr some/directory
               | 
               | People who write shellscripts are often not seeing
               | themselves writing a "program". They are just automating
               | things they would do manually. Going to an IDE in this
               | case is not something you'd consider.
               | 
               | I happen to be very aware of all the pitfalls in POSIX
               | shell, and it's rare that I see a shellscript where I
               | cannot immediately point out multiple potential problems,
               | and I definitely agree that most scripts should probably
               | be written in a language that doesn't contain so many
               | guns aimed at the user's feet. I'm just pointing out a
               | likely reason why people are not adopting powershell in
               | the huge numbers that Microsoft may have hoped for.
        
               | majkinetor wrote:
               | Nonsence. This is the same in powershell:
               | rm -r -f some/directory
        
           | disgruntledphd2 wrote:
           | Bash is a total disaster, I complain about it all the time.
           | Unfortunately, rather like JS, it's unavoidable.
        
           | eitland wrote:
           | I'd not consider bash a
           | 
           | 1. mainstream
           | 
           | 2. programming language
           | 
           | (of course _technically_ it is a programming language, but it
           | is also more precisely a scripting language)
        
       | earthboundkid wrote:
       | It's fashionable to hate XML because it was used in a lot of
       | places it was a bad fit in the 00s, but at least it's a pretty
       | good document language.
       | 
       | YAML though is always a bad fit. If you want machine readable
       | config, use JSON; human readable, use TOML. When does YAML ever
       | fit?
        
       | suttree wrote:
       | Reminds me that the reasoning behind austerity came from an Excel
       | calculation that didn't include all the relevant rows :~/
       | 
       | https://www.theguardian.com/politics/2013/apr/18/uncovered-e...
       | 
       | https://www.bbc.co.uk/news/magazine-22223190
       | 
       | https://theconversation.com/the-reinhart-rogoff-error-or-how...
        
       | blunte wrote:
       | If you want no misunderstandings, be explicit. This applies to
       | YAML and life in general. There's an annoying but fairly accurate
       | saying about assumptions that applies.
       | 
       | If you want something to be a specific type, you better have an
       | explicit way of indicating that. If you say quotes will always
       | indicate a string, great. Of course we know it's not that simple,
       | since there are character sets to consider.
       | 
       | The safest answer is to do something like XML with DTDs. But that
       | imposes a LOT of overhead. Naturally we hate that, so we make
       | some "convention over configuration" choices. But eventually, we
       | hit a point where the invisible magic bites us.
       | 
       | This is one case where tests would catch the problem, if those
       | tests are thorough enough - explicitly testing every possibility
       | or better yet, generative testing.
        
         | korijn wrote:
         | Or just opening your browser and trying out norwegian on a QA
         | environment.
        
       | thrower123 wrote:
       | I've never seen anything that used YAML that I didn't want to
       | douse with gasoline, nuke from orbit, and then salt the ground
       | where it once stood.
       | 
       | I cry and rage and rend my clothes when I stumble upon some new
       | thing that makes me have to use it.
        
       | dangoor wrote:
       | Cue also solves this problem. The "no" example is right on the
       | front page: https://cuelang.org
       | 
       | I used it for configuration of a Go program recently and found it
       | pleasant to work with. I hope the language is declared stable
       | soon, because it's a good model.
        
       | [deleted]
        
       | mcv wrote:
       | If it ignores part of the spec, I don't think "strictyaml" is the
       | correct name here. Instead, if it interprets everything as
       | string, perhaps "stringyaml" would have been more accurate,
       | though I'm sure that's not as good PR.
       | 
       | I'm reminded of the discussion we had a few days ago about
       | environment variables; one problem there is that env variables
       | are always strings, and sometimes you do want different types in
       | your config. But clearly having the system automatically
       | interpret whether it's a string or something else is a major
       | source of bugs. Maybe having an explicit definition of which
       | field should be which type would help, but then you end up with
       | the heavy-handed XML with its XSD schema.
       | 
       | Or you just use JSON, which is light-weight, easy to read, but
       | unambiguous about its types. I guess there's a good reason it's
       | so popular.
       | 
       | Maybe other systems like yaml and environment variables should
       | only ever be used for strings, and not for anything else, and I
       | suppose replacing regular yaml with 'strictyaml' could play a
       | role there. Or cause unending confusion, because it does violate
       | the spec.
        
         | povik wrote:
         | "saneyaml" would not make for bad PR
        
         | marcinzm wrote:
         | >If it ignores part of the spec, I don't think "strictyaml" is
         | the correct name here.
         | 
         | The article didn't fully explain it but strictyaml requires a
         | typed schema or defaults to string (or list or dict) if one is
         | not provided. So it strictly follows the provided schema.
        
           | mcv wrote:
           | That makes a big difference indeed. It wasn't clear to me
           | from the article, but string yaml + optional schema sounds
           | like a useful combination.
        
         | msiemens wrote:
         | > JSON, which is [...] unambiguous about its types
         | 
         | With the one exception that with floatig point values the
         | precision is not specified in the JSON spec and thus is
         | implementation defined[1] which may lead to its own issues and
         | corner cases. It for sure is better than YAML's 'NO' problem,
         | but depending on your needs JSON may have issues as well
         | 
         | [1]: https://stackoverflow.com/questions/35709595/why-would-
         | you-u...
        
           | wongarsu wrote:
           | Also JSON's complete lack of many commonly used types, and no
           | way to define any new ones.
        
             | mcv wrote:
             | Isn't that a problem with most of these config languages,
             | though? XML is the only one where I think this might be
             | possible.
        
               | wongarsu wrote:
               | Allowing you to define types is quite uncommon, but many
               | config languages allow more types than JSON (so more than
               | boolean, number, string, list, dict). Date datatypes are
               | a big one and are provided by about every second JSON
               | variant, in addition to TOML, ION and others.
        
       | lenkite wrote:
       | Why does YAML accept unquoted strings ? Be Strict. Be Safe.
        
       | Hendrikto wrote:
       | > The real fix requires explicitly disregaring the spec
       | 
       | Or... just quote your strings.
        
         | dragonwriter wrote:
         | Or, "use an appropriate schema". Or, for several of the
         | specific problems identified in the source article, use YAML
         | 1.2 (2009) instead of YAML 1.1 (2005), which the article
         | misidentifies as "YAML 2.0" and acts as if it is the current
         | spec.
        
       | andrewclunn wrote:
       | oh no, we want this value to be parsed as a string, so we need to
       | put quotes around it. the humanity!
        
       | jmull wrote:
       | > "While the website went down and we were losing money we chased
       | down a number of loose ends until finally finding the root
       | cause."
       | 
       | Hopefully not a real story. If you're trying out new
       | configurations in production and have no mechanism to rollback
       | problematic changes, you've got bigger problems than YAML.
       | 
       | To me, though, YAML, including "StrictYAML" doesn't solve any
       | problems JSON, perhaps w/comments, already solves.
        
       | RcouF1uZ4gsC wrote:
       | YAML seems like a really neat idea, but over time, I have I have
       | come to regard it as being too complicated for me to use for
       | configuration.
       | 
       | My personal favorite is TOML, but I would even prefer plain JSON
       | over YAML
       | 
       | The last thing I want at 2 AM when trying to look figure out if
       | an outage is due to a configuration change is having to think if
       | each line of my configuration is doing the thing I want.
       | 
       | YAML prizes making data look nicely formatted over simplicity or
       | precision. That for me, is not a tradeoff, I am willing to make.
        
         | Arnavion wrote:
         | They all have their downsides.
         | 
         | JSON:
         | 
         | - no comments, unless you fake them with fake properties,
         | unless your configuration has a schema that doesn't allow extra
         | fake properties
         | 
         | - no trailing commas; makes editing more annoying
         | 
         | - no raw strings
         | 
         | YAML:
         | 
         | - the automatic type coercion
         | 
         | - the many ways to encode strings ( https://yaml-
         | multiline.info/ )
         | 
         | - the roulette wheel of whether _this_ particular parser is
         | anal about two-space indentation or accepts anything as long as
         | it 's used consistently
         | 
         | - the roulette wheel of whether _this_ particular parser
         | supports uncommon features like anchors
         | 
         | TOML:
         | 
         | - runtime footguns in automated serialization (
         | https://news.ycombinator.com/item?id=24853386 )
         | 
         | - hard to represent deeply-nested structures, unless you switch
         | to inline tables which are like JSON but just different enough
         | to be annoying
        
           | anticristi wrote:
           | This makes me sad. It's 2021 and we still haven't figure out
           | how to serialize configuration in a format that is easy-to-
           | edit and predictable.
        
             | kstenerud wrote:
             | This is the problem space I'm targeting with
             | https://concise-encoding.org/
             | 
             | * Text AND binary so that humans can edit easily, and
             | machines can transmit energy and bandwidth efficiently.
             | 
             | * Carefully designed spec to avoid ambiguities (and their
             | security implications).
             | 
             | * Strong type support so you're not using all kinds of
             | incompatible hacks to serialize your data.
             | 
             | * Versioned, because there's no such thing as the perfect
             | format.
             | 
             | * Also, the website is 32k bytes ;-)
        
               | yyyk wrote:
               | + Has binary format.
               | 
               | + Avoids ambiguities.
               | 
               | - The format seems to feel the need to support
               | _everything_ , including things I am not sure are actual
               | usecases (what's the point of Markup element for example?
               | What does Metadata save us compared to just including it
               | in document, given that parsers must parse it anyway?).
               | This must make implementation most complex and costly,
               | and makes reading the text format more difficult.
               | 
               | - Not a fan of octal notation. At 3am not sure I can't
               | confuse 0 and o given certain fonts. Does anyone even use
               | it these days?
               | 
               | - Unquoted string were discussed in the thread, I'd like
               | to point out that it's very easy to make an unquoted
               | string not "text-safe" (according to the spec) without
               | noticing it, at which point document is invalid.
               | 
               | Just add white-space (maybe a user pasted a string from
               | somewhere without noticing whitespace at the end or
               | forgot the rules), a dot, an exclamation or a question
               | mark. Having surprises like that is IMHO worse than a
               | consistent quoting method.
               | 
               | Basically all the things I don't like are about the
               | format supporting a bit too much. YAML 1.1 should teach
               | us more is sometimes less.
        
               | kstenerud wrote:
               | Alright that's two votes against unquoted strings so far
               | (plus my wife agrees so that's three against!)
               | 
               | I put in octal because it was trivial to implement after
               | the others. The canonical format when it's stored or
               | being sent is binary, and a decoder shouldn't be
               | presenting integers in octal (that would just be weird).
               | But a human might want octal when inputting data that
               | will be converted to the binary format.
               | 
               | Markup is for presentation data, UI layouts, etc, but
               | with full type support rather than all the hacky
               | XML+whatever solutions that many UI toolkits are
               | adopting. Also, having presentation data in binary form
               | is nice to have.
        
               | yyyk wrote:
               | Well, unquoted strings work when a format is built for
               | that. If the default was "it's text unless we see the
               | special sequences" it would be better for unquoted
               | strings. But even then there are too many special
               | characters in this format IMHO.
               | 
               | I saw there's a 'Media' type in the spec. It's seems the
               | type is actually for serializing files. But there's no
               | "name" (or we can call it "description") field. Of course
               | we could accomplish this with a separate field - but than
               | again the entire type's functionality could be
               | accomplished with a u8x array and a string field. So if
               | you're specifying this type at all, might as well add a
               | name field to make it useful.
        
               | chousuke wrote:
               | I'm skimming through the human readable spec, and it
               | seems decent, but I noticed the spec allows unquoted
               | strings. What's the reasoning for this? In my experience
               | unquoted strings cause nothing but trouble, and are
               | confusing to humans who may interpret them as keywords.
               | 
               | Any reason for not using RFC2119 keywords in the spec?
               | Using them should make the spec easier to read.
        
               | kstenerud wrote:
               | > I noticed the spec allows unquoted strings. What's the
               | reasoning for this? In my experience unquoted strings
               | cause nothing but trouble, and are confusing to humans
               | who may interpret them as keywords.
               | 
               | Unquoted strings are much nicer for humans to work with.
               | All special keywords and object encodings are prefixed
               | with sigils (@, &, $, #, etc), so any bare text starting
               | with a letter is either a string or an invalid document,
               | and any bare text starting with a numeral is either a
               | number or an invalid document.
               | 
               | > Any reason for not using RFC2119 keywords in the spec?
               | Using them should make the spec easier to read.
               | 
               | I use a superset of those keywords to give more precision
               | in meaning: https://github.com/kstenerud/concise-
               | encoding/blob/master/ce...
        
               | chousuke wrote:
               | If strings are always unambiquously detectable, why allow
               | quoting them at all? Having two representations for the
               | same data means you can't normalize a document
               | unambiguously. I can understand having barewords seems
               | cleaner for things like map keys, but I am not convinced
               | that it's a worthwhile tradeoff.
               | 
               | An important feature of RFC2119 keywords is that they're
               | always capitalized (ie. the keyword is "MUST", not
               | "Must", or "must"). This makes requirements and
               | recommendations stand out amid explanatory text,
               | improving legibility. For example, RFC2119 itself uses
               | MUST and must with different meanings.
        
               | kstenerud wrote:
               | > If strings are always unambiquously detectable, why
               | allow quoting them at all?
               | 
               | Because strings can contain whitespace and other
               | structural characters that would confuse a parser.
               | 
               | > Having two representations for the same data means you
               | can't normalize a document unambiguously.
               | 
               | The document will always be normalized unambiguously in
               | binary format. The text format is a bit more lenient
               | because humans are involved.
               | 
               | The idea is that the binary format is the source of
               | truth, and is what is used in 90% of situations. The text
               | format is only needed as a conduit for human input, or as
               | a human readable representation of the binary data when
               | you need to see what's going on.
               | 
               | > An important feature of RFC2119 keywords is that
               | they're always capitalized (ie. the keyword is "MUST",
               | not "Must", or "must").
               | 
               | Hmm good point. I'll add that.
        
               | anticristi wrote:
               | Nice! I like some concepts that this format proposes, but
               | the `@` and `|` modifier feels a bit too "loaded".
        
               | kstenerud wrote:
               | It's a compromise; there are only so many letters,
               | numbers, and symbols available in a single keystroke on
               | all keyboards, and I don't want there to be any ambiguity
               | with numbers and unquoted strings (e.g. interpreting the
               | unquoted string value true as the boolean value true).
               | 
               | So everything else needs some kind of initiator and/or
               | container syntax to logically separate it from the other
               | objects when interpreted by a human or machine.
        
             | unhammer wrote:
             | https://dhall-lang.org/ ?
        
             | trhway wrote:
             | XML with a convenient UI tools to edit should have fit the
             | bill. Yet, for whatever reason a convenient UI tool would
             | never happen to be there when needed, and thus scared and
             | tired of manual editing of XML the world have embraced
             | YAML.
        
               | masklinn wrote:
               | > XML with a convenient UI tools to edit should have fit
               | the bill.
               | 
               | "You need this special tool to work" immediately and
               | instantly rules out "easy to edit". Or makes the debate
               | irrelevant: every format is easy to edit if you have "a
               | convenient UI" to do it for you.
        
               | anticristi wrote:
               | Opening XMLs in ZIP containers is easy! Just spin up
               | Word. :)
        
               | sergeykish wrote:
               | The fault was in XML editing, pure data authoring is
               | hard. We have convenient UI -- web browser, think of it
               | as literate programming, a way to merge man page and
               | configuration file.
               | 
               | And plain text editor is a "widely deployed special tool
               | to work". Actual data is                   countries:\n-
               | GB\n- IE\n- FR\n- DE\n- NO
               | 
               | Or                   636f 756e 7472 6965 733a 0a2d 2047
               | 420a         2d20 4945 0a2d 2046 520a 2d20 4445 0a2d
        
             | imhoguy wrote:
             | We had such: XML. With proper editor support it is easy. I
             | guess it needs rediscovery /s ;)
        
               | anticristi wrote:
               | I used XML and didn't like it:
               | 
               | - A proper editor was never around.
               | 
               | - Closing tags were verbose.
               | 
               | - Attributes vs tags was confusing.
               | 
               | - It didn't map "naturally" to common data types, like
               | lists, maps, integers, float, etc.
        
               | mattmanser wrote:
               | Don't forgot about namespaces, another fiddly bit of XML
               | that caused all sorts of problems and headaches.
        
               | sergeykish wrote:
               | You've just used XML tech as it was designed to post this
               | comment.
               | 
               | XML is serialization. I hardly believe you was concerned
               | about serialization while posting comment or thought
               | about attributes-tags distinction.
               | 
               | This page utilizes request to server for multi-user
               | editing. But it is easy to build truly serverless (like a
               | file) document with same interface:
               | data:text/html,<html><ul>Host: <span class=host
               | contenteditable>example.com
               | 
               | Change it, save it, done. Web handles input of lists,
               | maps, integers, float and much more.
        
               | anticristi wrote:
               | You are right. XML is great for encoding the DOM.
               | However, I didn't find it practical for interfacing with
               | humans, due to the concerns I raised.
        
               | sergeykish wrote:
               | It is not practical to edit plain text in binary:
               | 636f 756e 7472 6965 733a 0a2d 2047 420a         2d20 4945
               | 0a2d 2046 520a 2d20 4445 0a2d
               | 
               | It is not practical to edit Excel documents in plain
               | text:                   <?xml version="1.0"?>
               | <Workbook xmlns="urn:schemas-microsoft-
               | com:office:spreadsheet"           xmlns:o="urn:schemas-
               | microsoft-com:office:office"
               | xmlns:x="urn:schemas-microsoft-com:office:excel"
               | xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
               | xmlns:html="http://www.w3.org/TR/REC-html40">
               | <Worksheet ss:Name="Sheet1">             <Table>
               | <Row>                 <Cell><Data
               | ss:Type="String">ID</Data></Cell>
               | 
               | Tim Berners-Lee browser was browser-editor. Can't you see
               | parallels?
        
           | tgv wrote:
           | > - the automatic type coercion
           | 
           | Only when you "unmarshal" to an untyped data structure and
           | then make assumptions about the type. I've used yaml with a
           | go application, and it can't interpret NO as a bool when the
           | field is a string.
        
             | Arnavion wrote:
             | Correct, like TFA.
        
           | perlgeek wrote:
           | For hand-writing I love jsonnet, which produces JSON, is much
           | more convenient to write, and has some templating, functions
           | etc. https://jsonnet.org/
           | 
           | You wouldn't serialize data structures to jsonnet though,
           | you'd just generate JSON.
        
       | makkesk8 wrote:
       | We too had this problem, we solved it using the 3 letter country
       | code instead.
        
       | zomglings wrote:
       | Have had a similar issue when adding git revisions to YAML
       | documents.
       | 
       | The problem is that if a YAML parser sees a string like this:
       | 
       | "0123e04"
       | 
       | It interprets it as a number: 123 * 10^4
       | 
       | Our hacky solution was to prefix the revision hashes like
       | sha-0123e04, but still this was quite annoying.
       | 
       | After that experience, I have stopped using YAML for any of my
       | own configuration. Have started preferring putting my
       | configurations in code. And when I don't want that, have found
       | JSON good enough for my purposes.
        
       | bsenftner wrote:
       | This is such a core issue with a tool like YAML, how the hell did
       | this program get so popular? Are there that many developers willy
       | nilly using tools that fail in critical, silent ways, and the
       | horde of no-nothings follows them?
        
       | throwaway4good wrote:
       | Just use json.
        
       | dudeinjapan wrote:
       | You say Norway, I say Yesway.
        
       | korijn wrote:
       | Edit: downvoters, thanks! I realize this is not an easily
       | agreeable opinion ("let's all chant 'death to YAML!'") but it's
       | really easy to avoid losing money on something like this. Just do
       | proper testing.
       | 
       | Aren't you setting yourself up for surprises if you write file
       | formats such as TOML and YAML without reading the documentation,
       | learning and experimenting first? How about unit testing? Or
       | verifying the type in your config parser? Have you tried opening
       | your site in the norway config on your development or testing
       | environment? Or even in production? It all seems very basic and
       | not at all blog post or even HN worthy.
       | 
       | I'm going to assume the authors still haven't learned their
       | lesson and are going to experience many more surprises in the
       | future working with plain text file formats.
        
       | ancarda wrote:
       | You can catch this with yamllint
       | (https://github.com/adrienverge/yamllint):                   %
       | cat countries.yml          ---         countries:           - US
       | - GB           - NO           - FR              % yamllint
       | countries.yml          countries.yml           5:4       warning
       | truthy value should be one of [false, true]  (truthy)
        
       | IHLayman wrote:
       | Reminds me of the multiple YAML bugs that have plagued Kubernetes
       | such as https://github.com/kubernetes/kubernetes/issues/82296
       | 
       | It is interesting how the standard of any language seems to
       | diverge due to just the implementation from different parsers.
        
       | jgalt212 wrote:
       | another gotcha:
       | 
       | 2020-03-25 -> datetime.date(2020, 3, 25), not "2020-03-25"
        
       | paxys wrote:
       | I will never understand why YAML didn't just require quoted
       | strings. Did the creator not anticipate how many problems the
       | ambiguity would cause?
        
         | mattmanser wrote:
         | Never's a strong word, seems quite easy to understand why to
         | me. You've got ease of use reasons, historical reasons like the
         | mis-guided Robustness principle, etc.
         | 
         | And these sort of things happen time and time again.
         | 
         | And although officially JSON requires quoted strings, almost
         | none of the parsers actually enforce that, and so you will find
         | a huge amount of JSON out there that is not actually compliant
         | with the official spec.
         | 
         | Just like browsers have huge hacks in them to handle misformed
         | HTML.
        
           | Safety1stClyde wrote:
           | > And although officially JSON requires quoted strings,
           | almost none of the parsers actually enforce that
           | 
           | What programming language? I'm not familiar with those
           | parsers, the ones I know of very much do enforce quoted
           | strings.
           | 
           | > you will find a huge amount of JSON out there that is not
           | actually compliant with the official spec
           | 
           | The parsers I use all follow the current JSON RFC
           | specification, and I've never encountered any JSON from APIs
           | which they reject.
           | 
           | > Just like browsers have huge hacks in them to handle
           | misformed HTML.
           | 
           | Web browsers do deal with a variety of things, not so much
           | JSON parsers in my experience.
        
             | Macha wrote:
             | I think the point is that they accept more than the spec
             | dictates - do your JSON parsers accept e.g. the vs code
             | config file (JSON with comments) or JSON with unquoted
             | keys?
        
               | joelellis wrote:
               | The most commonly used parsers only accept valid JSON -
               | including the one included within most JS runtimes
               | (JSON.stringify/parse). VSCode explicitly uses a `jsonc`
               | parser, the only difference being that it strips comments
               | before it parses the JSON. There's also such thing as
               | `json5`, which has a few extra features inspired by ES5.
               | None of them are unquoted strings. I've never come across
               | anything JSON-like with unquoted strings other than YAML,
               | and everything not entirely compliant with the spec has a
               | different name.
        
               | jokethrowaway wrote:
               | Can you name a JSON parser which accept comments or
               | unquoted keys?
               | 
               | I've never seen one
        
               | cesarb wrote:
               | IIRC, Gson accepts unquoted keys.
        
       | groundCode wrote:
       | I've hit this exact same problem loading YAML in Ruby. Luckily
       | caught it before it hit prod, but still, it made me go argh for a
       | while.
        
       | abujazar wrote:
       | Norwegian here. I'd say the problem is YAML, not Norway :D
        
       | dan-robertson wrote:
       | Other reasons to not want types happening during parse time:
       | 
       | - "modified" numbers, e.g. $50, 35%, 1.2345568896347853246863477
       | 
       | - Dates. If your language tries to convert a date to Unix time or
       | Julian Day, you can have problems with time zones or distant or
       | historical dates.
       | 
       | - strings vs symbols. The person writing config shouldn't have to
       | care about this distinction.
       | 
       | - Automatic deduplication for fields of objects can be a problem.
        
       | [deleted]
        
       | mrzool wrote:
       | Wow, YAML has definitely some pretty quirky edges.
        
       | kleer001 wrote:
       | Ah, tool centric (not centered in the person) time-offset (the
       | ones making the original mistake doesn't see it) paredolia. Good
       | 'el TCTOP...
        
       | DoofusOfDeath wrote:
       | Funny coincidence. Around 2000, I worked for a company that
       | coined the term "Norway problem" for a different software
       | problem.
       | 
       | Their product used an MVCC database (I think ObjectStore). One of
       | their customers in Norway had a problem where updates to the
       | database seemed to not show up. IIRC the problem was a bug in
       | this company's software that caused MVCC to show an older version
       | of the database content than expected.
        
       | LeonM wrote:
       | I was bitten with this issue some time ago.
       | 
       | The Stripe library has constants for which type of VAT number is
       | supplied. One of those constants is 'NO_VAT'...
       | 
       | Needless to say, this caused me some grey hairs
        
       | schoen wrote:
       | Recent related HN discussion:
       | https://news.ycombinator.com/item?id=26365365
        
       | azernik wrote:
       | YAML had a worse example, once.
       | 
       | For the ease of entering time units YAML 1.1 parsed any set of
       | two digits, separated by colons, as a number in sexagesimal (base
       | 60). So 1:11:00 would parse to the integer 4260, as in 1 hour and
       | 11 minutes equals 4260 seconds.
       | 
       | Now try plugging MAC addresses into that parser.
       | 
       | The most annoying part is that the MAC addresses would _only_ be
       | mis-parsed if there were no hex digits in the string. Like the
       | bug in this post, it could only be reproduced with specific
       | values.
       | 
       | Generally, if you're doing implicit typing, you need to keep the
       | number of cases as low as possible, and preferably error out in
       | case of ambiguity.
        
         | m463 wrote:
         | slightly related, on my microwave 99 > 100, even 61 > 100
        
           | roland35 wrote:
           | I try to optimize my microwave button pushing too. I also
           | have a +30 seconds button, so for 1:30 I can hit
           | "1,3,0,Start" or "+30" three times and save a press!
        
           | bonzini wrote:
           | Why does your microwave compare numbers?
        
             | sokoloff wrote:
             | It doesn't compare them, it just counts down.
             | 
             | If I enter 1-3-0-start, I get 90 seconds of cooking. If I
             | enter 9-9-start, I get 99 seconds of cooking, so in that
             | sense, 99 > 130.
             | 
             | If I want about 90 seconds, I'll use 88 as it's faster to
             | enter (fewer finger movements).
        
               | JoeAltmaier wrote:
               | I've done the same thing for decades! Soul mates?
        
               | pdkl95 wrote:
               | Vi Hart - "How to Microwave Gracefully"
               | 
               | https://www.youtube.com/watch?v=T9E0zSpULFY
        
               | sokoloff wrote:
               | You might like this one as well.
               | 
               | Load soap into the dishwasher _after emptying_ rather
               | than after _loading_. If the soap dispenser is closed,
               | the dishes are dirty.
        
               | sneak wrote:
               | My rule is that loading the dishwasher means that one
               | loads all the available dishes, and runs it, even if it's
               | only x% full. We use the (large) sink as an input buffer.
               | 
               | If the dishwasher has dishes in it and it's not running,
               | they're clean.
        
               | rootusrootus wrote:
               | This is exactly our algorithm as all. I can't really
               | imagine flipping it the other way, since leaving dirty
               | dishes in a dishwasher will just let them completely dry
               | out, making it more likely they won't get fully clean
               | when the cycle is eventually run.
        
               | medstrom wrote:
               | Rinse until visually clean, then put in dishwasher.
        
               | teddyh wrote:
               | That's not a zero-copy algorithm. The algorithm with
               | using the soap dispenser being closed as a flag _is_
               | zero-copy.
        
               | corpMaverick wrote:
               | I want to have two dishwashers. One with the dirty dishes
               | and one with the clean dishes. So you never have to put
               | the dishes away. They go from the clean dishwasher to the
               | table to the dirty one. And then flip them.
        
               | sokoloff wrote:
               | There's a community near here with a high fraction of
               | Orthodox Jews. One condo I toured in my 20s had two
               | dishwashers and without thinking about why they did it, I
               | commented how I thought that was awesome that you'd never
               | need to put dishes away. (They of course installed two
               | dishwashers for orthodox separation of dishes from each
               | other.)
        
               | tjalfi wrote:
               | This idea comes up periodically on Reddit. [0] has a few
               | posts from people who have installed them, mostly for
               | bachelors.
               | 
               | [0] https://www.reddit.com/r/self/comments/ayr9c/when_im_
               | rich_im...
        
               | LilBytes wrote:
               | Blasphemy! I do the inverse. You're wrong. /s
               | 
               |  _insert code flame war here_
        
             | pdpi wrote:
             | Not the OP, but I have the same problem. For some reason
             | that escapes me, pressing the "10 sec" button 7 times
             | produces 00 70 instead of 01 10. If you then press the "1
             | min" button you get 01 70
        
               | nerdponx wrote:
               | Most microwaves (in the USA) do this, at least in my
               | experience.
               | 
               | They treat the ":" like a sum of two sexagesimal numbers,
               | rather than a sexagesimal digit separator.
        
             | mckirk wrote:
             | How else would you prove it's turing complete and can run
             | Doom?
        
         | whytookay wrote:
         | One that really surprised/confused me was that pyaml (and the
         | yaml spec) attempts to interpret any 0-prefixed string into an
         | octal number.
         | 
         | There was a list of AWS Account IDs that parsed just fine until
         | someone added one that started with a 0 and had no numbers
         | greater than 7 in it, after which our parser started spitting
         | out decidedly different values than we were expecting. Fixing
         | it was easy, but figuring out what in the heck was going on
         | took some digging.
        
         | adwn wrote:
         | > _For the ease of entering time units YAML 1.1 parsed any set
         | of two digits, separated by colons, as a number in sexagesimal
         | (base 60)._
         | 
         | This is a mind-boggling level of idiocy. Even leaving aside the
         | MAC address problem, this conversion treats "11:15" (= 675)
         | different from "11:15:00" (= 40500), even though those denote
         | the same time, while treating "00:15:00" (15 minutes past
         | midnight) and "15:00" (3 in the afternoon) the same.
        
         | dragonwriter wrote:
         | > YAML had a worse example, once.
         | 
         | It had it literally at the same time as it had the problem in
         | the article (the article refers to YAML 2.O, a nonexistent
         | spec, and to PyYAML, a real parser which supports only YAML
         | 1.1.)
         | 
         | Both the unquoted-YES/NO-as-boolean and sexagesimal literals
         | were removed in YAML 1.2. (As was the 0-prefixed-number-as-
         | octal mentioned in a sibling comment.)
        
         | lainga wrote:
         | We had a Grafana dashboard where one of the columns was a short
         | Git hash. One day, a commit got the hash `89e2520`, which
         | Grafana's frontend helpfully decided to display as "+infinity".
         | Presumably it was parsing 89E+2520.
        
           | rootusrootus wrote:
           | Ha, that reminds me of some work I was doing just yesterday,
           | implementing a custom dictionary for a postgres full text
           | search index. Postgres has a number of mappings that you can
           | specify, and it picks which one based on a guess of what the
           | data represents. I got bit by a string token in this same
           | format, because it got interpreted as an exponential number.
        
       | ravanave wrote:
       | Btw, the reason Haskell isn't used more isn't type system per se,
       | as all types can be inferred at the compilation time. People
       | would sometimes use this feature even to see if GHCi guesses the
       | type correctly (by correctly I mean exactly how the user wants,
       | technically it's correct always) first time and save them some
       | time writing it either with an extension or just copy&paste from
       | the interpreter window.
       | 
       | When it gets hairy is that most programming languages have low
       | entrance barrier. To write Haskell effectively you've got to
       | unlearn a lot of rooted bad habits and you get to dive into the
       | "mathematical" aspect of the language. Not only you got monads,
       | but there's plethora of other types you need to get comfortably
       | onboard with and the whole branch of mathematics talking about
       | types (you don't need to even know that such a field as category
       | theory exists to use it).
       | 
       | However, since most people just want to write X, or just want
       | hire a dev team at price they can afford, Haskell rarely is the
       | first choice language.
        
       | enaaem wrote:
       | Related xkcd
       | 
       | https://xkcd.com/327/
        
         | tpowell wrote:
         | Little Bobby Tables. I came here to post this.
        
       | lolinder wrote:
       | This comment was buried in a thread, but I'm bringing it out
       | because it's very relevant to the conversation:
       | 
       | https://news.ycombinator.com/item?id=26679728
       | 
       | > the article refers to YAML 2.O, a nonexistent spec, and to
       | PyYAML, a real parser which supports only YAML 1.1.
       | 
       | > Both the unquoted-YES/NO-as-boolean and sexagesimal literals
       | were removed in YAML 1.2.
        
       | keeperofdakeys wrote:
       | This is part of more general problem, they had to rename a gene
       | to stop excel auto-completing it into a date.
       | 
       | https://www.theverge.com/2020/8/6/21355674/human-genes-renam...
       | 
       | Edit: Apparently Excel has its own Norway Problem ...
       | https://answers.microsoft.com/en-us/msoffice/forum/msoffice_...
        
         | WalterBright wrote:
         | Good language design involves deliberately adding redundancy
         | which acts like a parity bit in that errors are more likely to
         | be detected.
        
           | richthegeek wrote:
           | That's an interesting statement to apply to natural
           | languages.
           | 
           | Consider this headline in English: "Man attacks boy with
           | knife". This can be read two ways, either the man is using a
           | knife to attack the boy, or the boy had the knife and thus
           | was being attacked.
           | 
           | The same sentence in Polish would make use of either genitive
           | or instrumental case to disambiguate (although barely).
           | However, a naive translation would only differ in the
           | placement of a `z` (with) and so errors could still slip
           | through. At least in this case the error would not introduce
           | ambiguity, simply incorrectness.
           | 
           | Similar to language design we can also consider: does the
           | inclusion/requirement of parity features reduce the
           | expressivity of the language?
        
             | tremon wrote:
             | _does the inclusion /requirement of parity features reduce
             | the expressivity of the language?_
             | 
             | This was a real eye-opener for me when learning Latin in
             | school: stylistic expressions such as meter, juxtaposition,
             | symmetry are so much easier to include when the meaning of
             | a sentence doesn't depend on word order.
        
               | thaumasiotes wrote:
               | > stylistic expressions such as meter, juxtaposition,
               | symmetry are so much easier to include when the meaning
               | of a sentence doesn't depend on word order.
               | 
               | Eh.... some things are easy and some things are hard in
               | any language. The specifics differ, and so do the details
               | of what kinds of things you're looking for in poetry.
               | Traditional Germanic verse focuses on alliteration.
               | Modern English verse focuses on rhyme. Latin verse
               | focuses on neither. [1]
               | 
               | English divides poetically strong syllables from
               | poetically weak syllables according to stress. It also
               | has mechanisms for promoting weak syllables to strong
               | ones if they're surrounded by other weak syllables.
               | 
               | In contrast, Latin divides strong syllables from weak
               | syllables by length. Stress is irrelevant. But while
               | stress can be changed easily, you're much more restricted
               | when it comes to syllable length -- and so Publius
               | Ovidius Naso is _invariably_ referred to by cognomen in
               | verse, because _it isn 't possible_ to fit his nomen,
               | Ovidius, into a Latin metrical scheme. That's not a
               | problem English has.
               | 
               | [1] I am aware of one exceptional Latin verse:
               | 
               | > O Tite, tute, Tati, tibi tanta, tyranne, tulisti.
        
               | [deleted]
        
         | mcv wrote:
         | The real problem here is that people use Excel to maintain
         | data. Excel is terrible at that. But the fact that it may
         | change data without the user being aware of it, is absolutely
         | the biggest failing here.
        
           | slightwinder wrote:
           | The problem is more that it's insanly overpowered, while
           | aiming for convenience out of the box. An "Excel Pro"-Version
           | which takes away all the convenience and gives the user the
           | power to configure the power pinpointet to their task might
           | be a better solution. Funny part is, most of those things are
           | already configurable now, but users are not educated enough
           | about their tools to actually do it.
        
           | mfer wrote:
           | Excel allows people to maintain data all over the place. From
           | golf league data to job actual data compared to estimates to
           | so much more. And, excel is accessible enough that tens of
           | millions (or maybe more) of people do it.
        
         | qwertox wrote:
         | Regarding Excel: It also happens with Somalia, which makes this
         | issue even stranger. Apparently because of "SOM".
        
         | bilalq wrote:
         | I don't understand why those support agents for Microsoft just
         | threw their hands up in the air and asked customers to go
         | through some special process for reporting the bug in Excel.
         | Why are they not empowered/able to report the issue on behalf
         | of customers? It's so clearly a bug in Excel that even they are
         | able to reproduce with 100% reliability.
        
           | sneak wrote:
           | It looks like it is intended behavior in Excel.
        
             | njarboe wrote:
             | Yes. Excel cells are set to a "General" format that, by
             | default, tries to guess the type of data the cell should be
             | from its content. A date looking entry gets converted to a
             | date type. A number looking string to a number (so 5.80 -->
             | 5.8, very annoying since I believe in significant digits)
             | When you import cvs data, for example, the default import
             | format is "General" so date looking strings will be changed
             | to a date format. This can be avoided by importing the file
             | and choosing to import the data as "Text". People having
             | these data corruption problems forgot to do that.
             | 
             | It's "user error" except that there is no way to set the
             | default import to import as "Text" (as far as I know), so
             | one has to remember to do the three step "Text" import
             | every time instead of the default one step "General"
             | import.
        
         | dalbasal wrote:
         | I suppose this is a cliched thought, but the more general
         | problem kind of emblematic of current "smart" features... and
         | their expected successors.
         | 
         | OOH, this is a a typically human problem. We have a system.
         | It's partly designed, partly evolved^. It's true enough to
         | serve well in the contexts we use it in on most days. There are
         | bugs in places (like norway, lol) that we didn't think of
         | initially, and haven't encountered often enough to evolve
         | around.
         | 
         | In code, we call it bugs. In bureaucracy, we just call it
         | bureaucracy. Agency A needs institution B's document X, in a
         | way that has bugs.
         | 
         | Obviously, it's also a typical machine problem. @hitchdev wants
         | to tell pyyaml that Norway exists, and pyyaml doesn't
         | understand. A user wants to enter "MARCH1" as text (or the name
         | of a gene), and excel doesn't understand.
         | 
         | Even the most rigid bureaucracy is made of people and has
         | fairly advanced comprehension ability though. If Agency A,
         | institution B or document X are so rigid that "NO" or "MARCH1"
         | break them... it probably means that there's a machine bug
         | behind the human one.
         | 
         | Meanwhile... a human reading this blog (even if they don't
         | program) can understand just fine from context and assumptions
         | of intent.
         | 
         | IDK... maybe I'm losing my edge, but natural language
         | programming is starting to seem like a possibility to me.
         | 
         | ^I feel like we need a new word for these: versioned, maybe?
        
         | masklinn wrote:
         | > This is part of more general problem
         | 
         | The more general problem basically being sentinel values (which
         | these sorts of inferences can be treated as) in stringly-typed
         | contexts: if everything is a string and you match some of those
         | for special consideration, you will eventually match them in a
         | context where that's wholly incorrect, and break something.
        
           | pdkl95 wrote:
           | edit: fixed formatting problem
           | 
           | > sentinel values
           | 
           | Using in-band signaling always involves the risk of
           | misinterpreting types.
           | 
           | > This is part of more general problem
           | 
           | DWIM ("Do What I Mean") was a terrible way to handle typos
           | and spelling errors when Warren Teitelman tried it at Xerox
           | PARC[1] over 50 years ago. From[2]:
           | 
           | >> In one notorious incident, Warren added a DWIM feature to
           | the command interpreter used at Xerox PARC. One day another
           | hacker there typed                   delete *$
           | 
           | >> to free up some disk space. (The editor there named backup
           | files by appending $ to the original file name, so he was
           | trying to delete any backup files left over from old editing
           | sessions.) It happened that there weren't any editor backup
           | files, so DWIM helpfully reported                   *$ not
           | found, assuming you meant 'delete *'
           | 
           | >> [...] The disgruntled victim later said he had been sorely
           | tempted to go to Warren's office, tie Warren down in his
           | chair in front of his workstation, and then type 'delete *$'
           | twice.
           | 
           | Trying to "automagically" interpret or fix input is always a
           | terrible idea because you cannot discover the actual _intent_
           | of an author from the text they wrote. In literary criticism
           | they call this problem  "Death of the Author"[3].
           | 
           | [1] https://en.wikipedia.org/wiki/DWIM
           | 
           | [2] http://www.catb.org/jargon/html/D/DWIM.html
           | 
           | [3]
           | https://tvtropes.org/pmwiki/pmwiki.php/Main/DeathOfTheAuthor
        
             | wnoise wrote:
             | Eh. "Death of the Author" is a reaction to the text not
             | being dispositive as to what the author meant. It's
             | deciding you don't care what the author meant, no longer
             | considering it a problem that the text doesn't reveal that.
             | Instead the text means whatever you can argue it means.
             | 
             | Which can be a fun game, but is ultimately pointless.
        
             | lisper wrote:
             | >> [...] The disgruntled victim later said he had been
             | sorely tempted to go to Warren's office, tie Warren down in
             | his chair in front of his workstation, and then type
             | 'delete $' twice.
             | 
             | Ironically, this did not render the way you intended
             | because HN interpreted the asterisk as an emphasis marker
             | in this line.
             | 
             | It works here:                   ... type 'delete *$'
             | twice.
             | 
             | because the line is indented and so renders as code, but
             | not here:
             | 
             | > ... type 'delete _$ ' twice.
             | 
             | because the subsequent line has _emphasized text*. So the
             | scoping of the asterisks is all screwed up.
        
           | chrisdone wrote:
           | That's a shrewd observation. Static types help with this
           | somewhat. E.g. in Inflex, if I import some CSV and the string
           | "00.10" as 0.1, then later when you try to do work on it like
           | 
           | x == "00.10"
           | 
           | You'll get a type error that x is a decimal and the string
           | literal is a string. So then you know you have to reimport it
           | in the right way. So the type system told you that an
           | assumption was violated.
           | 
           | This won't always happen, though. E.g. sort by this field
           | will happily do a decimal sort instead of the string 00.10.
           | 
           | The best approach is to ask the user at import time "here is
           | my guess, feel free to correct me". Excel/Inflex have this
           | opportunity, but YAML doesn't.
           | 
           | That is, aside from explicit schemas. Mostly, we don't have a
           | schema.
        
             | alpaca128 wrote:
             | > E.g. sort by this field will happily do a decimal sort
             | instead of the string 00.10.
             | 
             | So that system is not consistent with type checking? How is
             | this not considered a bug?
        
               | chrisdone wrote:
               | I mean if the value is imported as a decimal, then a sort
               | by that field will sort as decimal. This might not be
               | obvious if a system imports 23.53, 53.98 etc - a user
               | would think it looks good. It only becomes clear that it
               | was an error to import as a decimal when we consider
               | cases like "00.10". E.g, package versions: 10.10 is a
               | newer version than 10.1.
               | 
               | Types only help if you pick the right ones.
        
             | dalbasal wrote:
             | If we're talking about _general_ problems, then I don 't
             | think we can be satisfied with " _sometimes it 's a problem
             | with types and sometimes it's a UI bug_." That's not
             | general.
        
           | christophilus wrote:
           | Basically, autoimmune disease, but for software.
        
         | jgalt212 wrote:
         | and cusips, which are strings, get converted to scientific
         | notation.
         | 
         | https://social.msdn.microsoft.com/Forums/vstudio/en-US/92e0a...
        
         | afturkrull wrote:
         | > they had to rename a gene to stop excel auto-completing it
         | into a date.
         | 
         | No one in their right mind uses a spreadsheet for data
         | analysis. Good for working out your ideas but not in a
         | production environment. I figure excel was chosen as this the
         | utility the scientists were most familiar with.
         | 
         | The proper tool for the job would be a database. I recall
         | reading about a utility, a highly customized database with an
         | interface that looks just like a spreadsheet.
        
           | mattkrause wrote:
           | The analysis itself isn't (usually) happening in Excel.
           | 
           | A lot of tools operate on CSV files. People use Excel to peek
           | at the results or prepare input for other tools, and that's
           | how the date coercion slips in.
           | 
           | Sometimes, people do use it to collate the results of small
           | manual experiments, where a database might be overkill. Even
           | so, the data is usually analyzed elsewhere (R, graphPad,
           | etc).
        
         | andrepd wrote:
         | I'd say the more general problem is a bad type system! In any
         | language with a half decent type system where you can define
         | `type country = Argentina | ... | Zambia` this would be
         | correctly handled at compile-time, instead of having strange
         | dynamic weak typing rules (?) which throw runtime errors in
         | production (???).
        
         | wayoutthere wrote:
         | The one I've seen was a client who wanted to store credit card
         | numbers in an Excel sheet (yes I know this is a bad idea, but
         | it was 15 years ago and they were a scummy debt collection call
         | center). Signed integers have a size limit, which a 16 digit
         | credit card number significantly exceeds.
         | 
         | Now, you and I know this problem is solved by prepending ' to
         | the number and it will be treated as a string, but your average
         | Excel user has no understanding of types or why they might
         | matter. Many engineers will also look past this when generating
         | Excel reports.
        
         | zoward wrote:
         | An even more general problem is that we as humans use pattern-
         | matching as a cerebral tool to navigate our environment, and
         | sometimes the patterns aren't what they appear to be. The
         | Norway problem is the programming equivalent of an optical
         | illusion.
        
         | helsinkiandrew wrote:
         | > they had to rename a gene to stop excel auto-completing
         | 
         | I can just about understand that "No" might cause a problem,
         | but "Membrane Associated Ring-CH-Type Finger 1" being converted
         | to MAR-1 defeats me.
        
           | jasode wrote:
           | _> , but "Membrane Associated Ring-CH-Type Finger 1" being
           | converted to MAR-1 defeats me._
           | 
           | No, that's not what's happening. To clarify...
           | 
           | If you type a _41 characters_ long string of _" Membrane
           | Associated Ring-CH-Type Finger 1"_ into a cell -- Excel will
           | _not_ convert that to a date of MAR-1.
           | 
           | On the other hand, it's if you type an _6-char abbreviation_
           | of _" MARCH1"_ that _looks like a realistic date_ -- Excel
           | converts it to MAR-1.
        
       | jasode wrote:
       | That author's blog post sent me down a rabbit hole of insanity
       | with YAML and the PyYAML parser idiosyncrasies.
       | 
       | First, he mentions "YAML 2.0" but there's no such reference about
       | "2.0" from yaml.org or Google/Bing searches. Yaml.org and
       | wikipedia says yaml is at 1.2. Apparently the other commenters in
       | this thread clarified that the older "YAML 1.1" is what the
       | author is referring to.
       | 
       | Ok, if we look at the official YAML 1.1 spec[1], it has this
       | excerpt for implicit bool conversions:
       | y|Y|yes|Yes|YES|n|N|no|No|NO
       | |true|True|TRUE|false|False|FALSE       |on|On|ON|off|Off|OFF
       | 
       | But the pyyaml code excerpts[2][3] from resolver.py has this:
       | u'tag:yaml.org,2002:bool',
       | re.compile(ur'''^(?:yes|Yes|YES|n|N|no|No|NO
       | |true|True|TRUE|false|False|FALSE
       | |on|On|ON|off|Off|OFF)$''', re.X),
       | 
       | The programmer _omitted_ the single character options of  'y' and
       | 'Y' but it still has 'n' and 'N' ?!? The lack of symmetry makes
       | the parser inconsistent.
       | 
       | And btw for trivia... PyYAML also converts strings with leading
       | zeros to numbers like MS Excel:
       | https://stackoverflow.com/questions/54820256/how-to-read-loa...
       | 
       | [1] https://yaml.org/type/bool.html
       | 
       | [2] 2020 latest:
       | https://github.com/yaml/pyyaml/blob/ee37f4653c08fc07aecff69c...
       | 
       | [3] 2006 original :
       | https://github.com/yaml/pyyaml/blob/4c570faa8bc4608609f0e531...
        
       | atombender wrote:
       | The world _desperately_ needs a replacement for YAML.
       | 
       | TOML is fine for configuration, but not an adequate solution for
       | representing arbitrary data.
       | 
       | JSON is a fine data exchange format, but is not particularly
       | human-friendly, and is especially poor for editable content:
       | Lacks comments, multi-line strings, is far too strict about
       | unimportant syntax, etc.
       | 
       | Jsonnet (a derivative of Google's internal configuration
       | language) is very good, but has failed to reach widespread
       | adoption.
       | 
       | Cue is a newer Jsonnet-inspired language that ticks a lot of
       | boxes for me (strict, schema support, human-readable, compact),
       | but has not seen wide adoption.
       | 
       | Protobuf has a JSON-like text format that's friendlier, but I
       | don't think it's widely adopted, and as I recall, it inherits a
       | lot of Protobufisms.
       | 
       | Dhall is interesting, but a bit too complex to replace YAML.
       | 
       | Starlark is a neat language, but has the same problem as Dhall.
       | It's essentially a stripped-down Python.
       | 
       | Amazon Ion [1] is neat, but I've not seen any adoption outside of
       | AWS.
       | 
       | NestedText [2] looks promising, but it's just a Python library.
       | 
       | StrictYAML [3] is a nice attempt at cleaning up YAML. But we need
       | a new language with wide adoption across many popular languages,
       | and this is Python only.
       | 
       | Any others?
       | 
       | [1] https://amzn.github.io/ion-docs/
       | 
       | [2] https://nestedtext.org/
       | 
       | [3] https://github.com/crdoconnor/strictyaml/
        
         | svnpenn wrote:
         | You seem pretty quick to disregard TOML. I switched all my JSON
         | and YAML for TOML. Do you care to detail what is missing?
        
           | atombender wrote:
           | TOML quickly breaks down with lots of nested arrays of
           | objects. For example:                   a:           b:
           | - c: 1           - d:             - e: 2             - f:
           | g: 3
           | 
           | Turns into this, which is unreadable:
           | [[a.b]]         c = 1              [[a.b]]         [[a.b.d]]
           | e = 2              [[a.b.d]]         [a.b.d.f]         g = 3
           | 
           | TOML also has a few restrictions, such as not supporting
           | mixed-type arrays like [1, "hello", true], or arrays at the
           | root of the data. JSON can represent any TOML value (as far
           | as I know), but TOML cannot represent any JSON value.
           | 
           | At my company we use YAML a lot for table-driven tests (e.g.
           | [1]), and this not only means lots of nested arrays, but also
           | having to represent pure data (i.e. the expected output of a
           | test), which requires a format that supports encoding
           | arbitrary "pure" data structures of arrays, numbers, strings,
           | booleans, and objects.
           | 
           | [1] https://github.com/sanity-io/groq-test-suite/
        
             | svnpenn wrote:
             | Looks fine to me:                   [[a.b]]         c = 1
             | d = [            { e = 2 },            { f = { g = 3 } }
             | ]
        
               | timClicks wrote:
               | An improvement, but the original YAML is still
               | significantly better, in my opinion.
        
               | Arnavion wrote:
               | Also many (most? all?) serializers don't let you control
               | which fields are serialized inline vs not. So if you have
               | a program that _generates_ configuration, you 're going
               | to end up with the original unreadable form anyway.
        
         | ak217 wrote:
         | I don't think YAML is going anywhere, largely because it was
         | the first format to prioritize readability and conciseness, and
         | has used that advantage to achieve critical mass.
         | 
         | It's far more productive to push for incremental changes to the
         | YAML spec (or even a fork of it) to make it more sane and
         | better defined. Things like a StrictYAML subset mode for
         | parsers in other popular languages.
        
           | dragonwriter wrote:
           | > It's far more productive to push for incremental changes to
           | the YAML spec
           | 
           | The problems this article raises and strictyaml purports to
           | address were addressed in YAML 1.2, already supported in
           | python via ruamel.yaml; YAML 1.2 addresses much of this in
           | the Core schema which is the closest successor to the default
           | behavior of earlier spec versions, and does so more
           | completely in the support for schemas more generally, which
           | define both the supported "built-in" tags (roughly, types)
           | and how they are matched from the low-level representation
           | which consists only of strings, sequences, and maps (which,
           | incidentally, are the only three tags of the "Failsafe"
           | schema; there's also a "JSON" Schema between Failsafe and
           | Core, which has tags corresponding to the types supported by
           | JSON.
        
         | fmakunbound wrote:
         | XML and XML Schema solved this more than 20 years ago. It had
         | to be replaced with JSON by the web developers though, so they
         | could just "eval() it" to get their data.
        
           | servercobra wrote:
           | All except the easily written by humans part. Which is kind
           | of a key part.
        
           | jdeisenberg wrote:
           | XML with RelaxNG (https://relaxng.org/) would have made life
           | so much better than using XML Schema, but, as they say, that
           | ship has long since sailed.
        
           | MrPatan wrote:
           | If all the smart people like you used XML, how come it was so
           | painful to use and it died?
        
             | [deleted]
        
             | rayiner wrote:
             | <humor>It died because web developers weren't bright enough
             | to understand schemas.
             | 
             | </humor>
        
             | takeda wrote:
             | Because it offered all these things parent responded, but
             | that made it too complex. You either provide schema and get
             | commodities of describing it or you don't.
             | 
             | I had a chance of using SOAP at one point. It was a F5
             | device and I used a python library. What I really liked is
             | that when it connected to it it downloaded its schema, and
             | then used that to generate an object. At that point you
             | just communicated with device like you did with any object
             | in Python.
             | 
             | We abandoned it for inferior technologies like REST and
             | JSON, because they were harder to use from JS, as parent
             | mentioned.
        
               | MrPatan wrote:
               | Parent didn't say it was harder to use from JS. Parent
               | said "It had to be replaced with JSON by the web
               | developers though, so they could just "eval() it" to get
               | their data."
               | 
               | First of all, I was there 20 years ago. I had to deal
               | with XML, XSLT, one kind of Java XML parsers that didn't
               | fully do what I needed, another kind of Java XML parsers
               | that didn't fully do what I needed. And oh boy was it a
               | pain. I just wanted to get a few properties of a bunch of
               | entities in a bigger XML document, that's all. Big fail.
               | 
               | Second, JSON always had a parser in JS, so I don't know
               | where that eval nonsense is coming from.
               | 
               | Third, JS actually had the best dev UX for XML of all
               | languages 20 years ago. Maybe you know JavaScript from
               | Node.js, but 20 years ago it used to run excusively in
               | web browsers, which even then were pretty good at parsing
               | XML documents. The browser of course had a JS DOM
               | traversal API known to every single JS developer, and
               | very soon (Although TBH I can't remember if before or
               | after JSON) it also had xpath querying functions, all
               | built in.
               | 
               | XML was _so bad_ , that its replacement came from the
               | language where it was actually easiest to use. think
               | about that for a second.
               | 
               | So the answer to the question "Why was XML replaced?" is
               | not "Because webdevs lol".
               | 
               | I suspect it was because it has both content and
               | attributes, which all but guarantees it's impossible to
               | create a bunch of simple, common data structures from it
               | (like JSON does).
        
             | [deleted]
        
         | ng12 wrote:
         | Jsonnet hasn't taken off because it's turing complete. It's a
         | really great language for generating JSON but not a replacement
         | for JSON.
        
         | diggan wrote:
         | Seems you're missing my personal favorite, extensible data
         | notation - EDN (https://github.com/edn-format/edn). Probably
         | I'm a bit biased coming from Clojure as it's widely used there
         | but haven't really found a format that comes close to EDN when
         | it comes to succinctness and features.
         | 
         | Some of the neat features: Custom literals / tagged elements
         | that can have their support added for them on runtime/compile
         | time (dates can be represented, parsed and turned into proper
         | dates in your language). Also being able to namespace data
         | inside of it makes things a bit easier to manage without having
         | to result to nesting or other hacks. Very human friendly, plus
         | machine friendly.
         | 
         | Biggest drawback so far seems to be performance of parsing,
         | although I'm not sure if that's actually about the format
         | itself, or about the small adoption of the format and therefore
         | not many parsers focusing on speed has been written.
        
         | mc10 wrote:
         | S-expressions are super easy to parse and are fairly easy for
         | humans to read. See e.g. using s-expressions in OCaml:
         | https://dev.realworldocaml.org/data-serialization.html
        
           | Nihilartikel wrote:
           | Apropos of this, in Clojure-land the idiomatic serialization
           | is, EDN [1], which is pretty ergonomic to work with IMO,
           | since in most cases it is the same as a data-literal in
           | Clojure.
           | 
           | My feeling is that :keywords reduce the need and temptation
           | to conflate strings and boolean/enumerations that occurs when
           | there's no clear way to convey or distinguish between a
           | string of data and a unique named 'symbol'. I miss them when
           | I'm in Pythonland.
           | 
           | [1] https: https://www.compoundtheory.com/clojure-edn-
           | walkthrough/
        
           | gnud wrote:
           | S-expressions inherits all trouble with data types from json
           | (dates, times, booleans, integer size, number vs numeric
           | string).
           | 
           | You get neat ways of nesting data, but that is not enough for
           | a robust and mistake-resilient configuration language.
           | 
           | The problem isn't parsing in itself. The problem is having
           | clear sematics, without devolving into full SGML DTDs (or
           | worse still, XML schemas).
        
             | diggan wrote:
             | > S-expressions inherits all trouble with data types from
             | json (dates, times, booleans, integer size, number vs
             | numeric string).
             | 
             | Hm, not sure that's true, S-expressions would only define
             | the "shape" of how you're defining something, not the
             | semantics of how you're defining something. EDN
             | https://github.com/edn-format/edn for all purposes is
             | S-expressions and have support for custom literals and
             | more, to avoid "the trouble with data types from JSON"
        
         | rubyn00bie wrote:
         | Your list is like a graveyard of my dreams and hopes. Anything
         | that doesn't validate the format of the underlying data is
         | pretty much dead to me...
         | 
         | The problem with most of these is they're useless to describe
         | the data. Honestly, it is completely not useful to have the
         | following to describe data:
         | 
         | email => string
         | 
         | name => string
         | 
         | dob => string
         | 
         | IMHO, it is akin to having a dictionary (like Oxford English)
         | read like:
         | 
         | email - noun
         | 
         | name - noun
         | 
         | birthday - noun
         | 
         | It says next to nothing except, yes, they are nouns. All too
         | often I waste time fighting nils and bullshit in fields or
         | duplicating validation logic all over the place.
         | 
         | "Oh wow, this field... is a string..? That's great... _smiles
         | gently_ except... THERE SHOULD NOT BE EMOJI IN MY FUCKING UUID,
         | SCHEMA-CHUD. GET THE FUCK OFF MY LAWN! "
        
           | scythe wrote:
           | If you want automatic built-in string validation, one option
           | that seems particularly interesting is to use a variant of
           | Lua patterns, which are weaker and easier to understand than
           | regular expressions, but still provide a significant degree
           | of "sanity" for something like an email. The original version
           | works on bytes and not runes, but you could simply write a
           | parser that works on runes instead, and the pattern-matching
           | code is just 400 old and battle-tested lines of C89. You
           | might want to add one extension: allow for escape sequences
           | to be treated as a single character (hence included in
           | repetition operators and adding the capability to match
           | quoted strings); with this extension, I think you could
           | implement full email address validation:
           | 
           | https://i.stack.imgur.com/YI6KR.png
           | 
           | Lua patterns have also shown up in other places, such as
           | BSD's httpd, and an implementation for Rust:
           | 
           | https://www.gsp.com/cgi-bin/man.cgi?section=7&topic=PATTERNS
           | 
           | https://github.com/stevedonovan/lua-patterns
           | 
           | http://lua-users.org/wiki/PatternsTutorial
        
           | geoduck14 wrote:
           | >THERE SHOULD NOT BE EMOJI IN MY FUCKING UUID
           | 
           | thanks for the lolz
        
           | Nitramp wrote:
           | My experience is that validation quickly becomes surprisingly
           | complex, to the point of being infeasible to express in a
           | message format.
           | 
           | Not only are the constraints very hard to express (remember
           | that one 2000 char regexp that really validates email
           | addresses?), they are also contextual: the correct validation
           | in an Android client is not the same as on the server side.
           | Eg you might want to check uniqueness or foreign key
           | constraints that you cannot check on the client. Sometimes
           | you want to store and transmit invalid messages (eg partially
           | completed user input). And then you have evolving validation
           | requirements: what do you do with the messages from three
           | years ago that don't have field X yet?
           | 
           | Unfortunately I don't think you can express what you need in
           | a declarative format. Even minimal features such as regexp
           | validation or enums have pitfalls.
           | 
           | I think it's better to bite the bullet and implement the
           | contextually required validation on each system boundary, for
           | any message crossing boundaries.
        
           | sangnoir wrote:
           | It sounds to me like XML with a DTD & XSD would solve your
           | problem. XML no longer fashionable, but its validation is
           | Turing-complete
        
           | tormeh wrote:
           | I agree with this, something RON/JSON-like with type
           | annotations would be great:                   {
           | "isTrue":false:Boolean,
           | "id":"123e4567-e89b-12d3-a456-426614174000":UUID         }
        
         | djedr wrote:
         | Still early, but here's my baby I hope can improve things:
         | 
         | website with grammar spec: https://tree-annotation.org/
         | 
         | prototype of a JSON/YAML alternative for JS:
         | https://github.com/tree-annotation/tao-data-js
         | 
         | same thing, even less finished for C#: https://github.com/tree-
         | annotation/tao-data-csharp
         | 
         | working on it constantly, more to come soon
        
         | dragonwriter wrote:
         | > The world desperately needs a replacement for YAML.
         | 
         | The world desperately needs support for YAML 1.2, which solves
         | the problems the article addresses fairly completely (largely
         | in the "default" Core schema[0], but more completely with the
         | support for schemas in general), plus a bunch of others, and
         | has for more than a decade. But YAML 1.2 libraries aren't
         | available for most languages.
         | 
         | [0] not actually an official default, but reflects a cleanup of
         | the YAML 1.1 behavior without optional types, so its
         | defaultish. Back when it looked like YAML 1.3 might happen in
         | some reasonably-near future, it was actually indicated by team
         | members that the JSON Schema for YAML (not to be confused with
         | the JSON Schema spec) would be the explicit default YAML Schema
         | in 1.3, which has a lot to recommend it.
        
           | tormeh wrote:
           | Nope nope nope. YAML is awful and needs to die. The more you
           | look at it the worse it gets. The basic functionality is
           | elegant (at least until you consider stuff like The Norway
           | Problem), but the advanced parts of YAML are batshit insane.
        
             | dragonwriter wrote:
             | "The Norway Problem" is a YAML 1.1 problem, of which there
             | are many.
             | 
             | What advanced parts of YAML are you talking about that
             | remain problems in YAML 1.2?
        
               | medstrom wrote:
               | From the article:
               | 
               | > The most tragic aspect of this bug, howevere, is that
               | it is intended behavior according to the YAML 2.0
               | specification.
        
               | dragonwriter wrote:
               | The article is simply, factually wrong; there is no "YAML
               | 2.0 specification" [0], and everything they point to is
               | YAML 1.1, and addressed in YAML 1.2 (the most recent YAML
               | spec, from 2009.)
               | 
               | [0] https://yaml.org/
        
         | geraldbauer wrote:
         | You might look at JSON Next variants (if you remember -
         | "classic" JSON is a subset of YAML), see
         | https://github.com/json-next/awesome-json-next
         | 
         | My own little JSON Next entry / format is called JSON 1.1 or
         | JSONX, that is, JSON with eXtensions, see https://json-
         | next.github.io
        
           | orthoxerox wrote:
           | The list is missing http://www.relaxedjson.org/
           | 
           | Also, there's no explanation what <..-..> and <..+..> do.
        
         | tormeh wrote:
         | Also RON: https://github.com/ron-rs/ron
         | 
         | A bit like JSON5, but I believe even more advanced.
        
         | hansvm wrote:
         | > The world desperately needs a replacement for YAML.
         | 
         | For situations like TFA you really want a configuration
         | language that behaves exactly like you think it will, and since
         | you don't have to interop with other organizations you don't
         | really need a global standard.
         | 
         | Moreover, broadly used config languages can be somewhat
         | counterproductive to that goal. Take JSON as an example;
         | idiomatic JSON serdes in multiple programming languages has
         | discrepancies in minint, maxfloat, datetime, timezone, round-
         | tripping, max depth, and all kinds of other nuanced issues.
         | Existing tooling is nice when it does what you expect, but for
         | a no-frills, no-surprises configuration language I would almost
         | always just prefer to use the programming language itself or
         | otherwise write a parser if that doesn't suffice (e.g., in
         | multilingual projects).
         | 
         | Mildly off-topic: The problem here, more or less, was that the
         | configuration change didn't have the desired effect on an in-
         | memory representation of that configuration. We can mitigate
         | that at the language level, but as a sanity check it's also a
         | good idea to just diff the in-memory objects and make sure the
         | change looks kind of like what you'd expect.
        
           | atombender wrote:
           | You don't need wide adoption for internal projects in an
           | organization, but you _do_ want great toolchain support.
           | 
           | For example, the fact that NestedText is a Python library
           | means a Python team could use it, but it's a poor fit for an
           | organization whose other teams use Go and
           | JavaScript/TypeScript.
           | 
           | We use YAML for much more than configuration, by the way. I
           | feel like YAML hits a nice sweet spot where it's usable for
           | almost everything.
        
         | IshKebab wrote:
         | JSON5 is the best option currently. A fair number of tools in
         | the JS ecosystem support it.
        
           | atombender wrote:
           | JSON5 is better than JSON on my points, but it has downsides
           | compared to YAML. For example, YAML is _very_ good at
           | multiline strings that don 't require any sort of quoting,
           | and knows to remove preceding indentation:
           | foo: |         "This is a string that goes across
           | multiple lines," he wrote.
           | 
           | In JSON5, you'd have to write:                 {         foo:
           | \"This is a string that goes across \       multiple lines,\"
           | he wrote."       }
           | 
           | This sort of ergonomic approach is why YAML is so well-liked,
           | I think. (Granted, YAML's use of obscure Perl-like sigils to
           | indicate whitespace mode is annoying, but it does cover a lot
           | of situations.)
           | 
           | YAML is also great at arrays, mimicking how you'd write a
           | list in plaintext:                 foo:       - "hello"
           | - 42       - true
        
         | dqpb wrote:
         | I've used most of the technologies you listed. Cue is the best,
         | and the only one with strong theoretical foundations. I've been
         | using it for some time now and won't go back to the others.
        
         | debug-desperado wrote:
         | Thanks for this list, I've never heard of Ion. I'll consider it
         | for config and even replacing Avro & Protobuf in future
         | projects.
        
       | joshxyz wrote:
       | This is why i love JSON. It's only string, number, boolean,
       | arrays, objects/dictionaries, unless you write custom serializer
       | and deserializers..
        
         | lokedhs wrote:
         | Except that its numbers are underspecified and cannot be used
         | safely outside of a certain range. The spec explicitly states
         | that the precision of numbers is not defined, meaning that N
         | and N+1 may be the same number, and its behaviour would depend
         | on the parser you're using.
         | 
         | The number one rule when creating a serialisation format should
         | be that serialisation and deserialisation is predictable. It's
         | quite remarkable that two of the most popular formats doesn't
         | do this.
         | 
         | I'm actually surprised we haven't seen any major security
         | issues caused by this.
        
       | bmn__ wrote:
       | The problem is insufficiently analysed by the article author and
       | the commenters in this thread so far. It is very superficial. The
       | recent thread "Can't use iCloud with "true" as the last name"
       | https://news.ycombinator.com/item?id=26364993 went deeper. Let me
       | take up its relevant particulars into this thread.
       | 
       | The article author hitchdev does not say it outright, but it is
       | heavily implied that the YAML file was edited by hand. This is
       | the immediate cause of the problem. The indirect root of the
       | problem is that the spec authors chose a plain text serialisation
       | format and thus created an _affordance_
       | http://enwp.org/Affordance#As_perceived_action_possibilities to
       | be edited by hand.
       | 
       | This turns out the be unsafe/source of bugs because YAML end-
       | users are not capable of correctly applying the serialisation
       | rules considering the edge cases detailed in the article because
       | humans are creatures of habit, applying analogy and common sense,
       | making assumptions and then sometimes go wrong, whereas a piece
       | of software will not make the Norway, Null etc. mistakes.
       | hitchdev even writes that quoting the string is "a fix for sure,
       | but kind of a hack", but that's a grave misunderstanding. Quoting
       | the string here is actually applying the serialisation rules
       | correctly.
       | 
       | The tangential at the end of the article about typing is also
       | orthogonal/irrelevant. YAML is strictly/strongly/unambiguously
       | typed, and so is the mentioned variant Strict YAML. The
       | difference is that Strict YAML has serialisation rules that are
       | more amenable to or aligning with the human factors of habit etc.
       | and thus work better in practice.
       | 
       | My personal recommendation is to never edit YAML by hand and
       | always use a serialiser. This is less convenient, but safe.
       | 
       | In closing, I would like the reader of this comment to make an
       | effort to distinguish between "what is" and "what ought to be" in
       | their head, otherwise the ideas here will be very muddled.
        
         | dragonwriter wrote:
         | > The problem is insufficiently analysed by the article author
         | 
         | The article author also misidentifies the version of the YAML
         | spec (calling it 2.0, which doesn't exist; the behavior is from
         | YAML 1.1, and this class of problems motivated a bunch of
         | changes in YAML 1.2, which has been out since 2009.)
         | 
         | But the article author isn't trying to analyze the problem,
         | he's trying to rationalize why what is notionally a YAML-
         | processing library just ignores the spec.
        
         | Aeolun wrote:
         | > never edit YAML by hand and always use a serialiser
         | 
         | I don't follow this. If yaml is your config format, and you are
         | not editing it by hand, what are you editing?
        
           | bmn__ wrote:
           | I work on the deserialisation. This is a one-liner in many
           | programming languages.
        
         | sfvisser wrote:
         | The problem is not 'someone is not correctly following the
         | serialization rules', the problem is 'the serialization rules
         | are quite terrible'.
         | 
         | This is not some interesting trade-off, this problem is fixable
         | on all axes by using non-ambiguous, non-overloaded typing rules
         | for your config format.
         | 
         | Even JSON and XML got this right.
        
           | bmn__ wrote:
           | > The problem is not 'someone is not correctly following the
           | serialization rules'
           | 
           | Yes, yes, I pointed that out. grep "immediate cause" and
           | "indirect root"
           | 
           | > the serialization rules are quite terrible
           | 
           | Did that need to be said explicitly? I agree FWIW. I have
           | already made a value judgement mildly against YAML, in case
           | that's not clear. It's only mild because the problem can be
           | worked around. I think this approach is more practical than
           | moving the whole world over to a completely different thing.
           | 
           | > problem is fixable [...] non-ambiguous [...] rules
           | 
           | Is the implication here that you say YAML is ambiguous? It's
           | not. I don't want sloppy analysis. To be precise, the
           | ambiguity is imagined, it does not exist on the spec or
           | software level, only in the head of people.
        
         | atleta wrote:
         | The very point of yaml is that it is _easy_ to edit by hand. If
         | you use an, I suppose, GUI editor then you don 't need yaml.
         | You could use any strictly typed serialization format. (Self
         | describing or with a schema.)
        
       | NaturalPhallacy wrote:
       | This is why implicit typing is an _invitation_ to errors.
        
       | paulintrognon wrote:
       | I am sometimes annoyed by the fact you have to put double quotes
       | around string properties in JSON. It would be so much lighter to
       | use JS syntax..! Then I read articles like this one. Thank you
       | JSON for not trying to be smart.
        
       | teddyh wrote:
       | This is another good argument against weak types in general.
       | Strong types are better, and explicit is better than implict.
        
       | pintxo wrote:
       | > "While the website went down and we were losing money we chased
       | down a number of loose ends until finally finding the root
       | cause."
       | 
       | And that's why you have a staging environment. Or you debug in
       | production, whatever you prefer.
        
         | atoav wrote:
         | I'd go further and say _this is why you write tests_. Creating
         | tests that cover a lot (or all) possible inputs is sometimes
         | not that hard and really pays off if you manage to catch a very
         | common error like the Norway thing. Even better if you catch
         | something that would have been a nightmare to fix in
         | production.
         | 
         | I say this because two days ago I wrote a test that used all
         | country codes as input. It took 15 minutes to write that test.
         | During the whole testing session I found at least 5 mistakes of
         | which 3 would have been quite dramatic.
        
           | simion314 wrote:
           | >I say this because two days ago I wrote a test that used all
           | country codes as input. It took 15 minutes to write that
           | test. During the whole testing session I found at least 5
           | mistakes of which 3 would have been quite dramatic.
           | 
           | And how many minutes to test all
           | city/state/region/street/person names ?
           | 
           | It can also happen that you test s will become outdated, like
           | when url standard changed and more characters codes were
           | allowed.
        
         | mrighele wrote:
         | Or you just return to the previos (and working) version of the
         | website while you fix the issue. At least if you a good old
         | monolith; if you have 10s of microservices it may be more
         | complicated
        
         | groundCode wrote:
         | Bugs make it to production no matter how careful you are.
         | 
         | What matters is how you deal with incidents as an organisation,
         | not that you should never release a bug.
        
         | eitland wrote:
         | Everybody has a testing environment. Some people are lucky
         | enough enough to have a totally separate environment to run
         | production in.
         | 
         | https://mobile.twitter.com/stahnma/status/634849376343429120
        
       | pietroppeter wrote:
       | I like strict yaml but I have used it very little. Anyone who
       | uses it more that can give more feedback?
        
       | [deleted]
        
       | grenoire wrote:
       | I was helping out a friend of mine in the risk department of a
       | Big 4; he was parsing CSV data from a client's portfolio. Once he
       | started parsing it, he was getting random NaNs (pandas' nan type,
       | to be more accurate).
       | 
       | I couldn't get access to the original dataset but the column gave
       | it away. Namibia's 2-letter ISO country code is NA--which happens
       | to be in pandas' default list of NaN equivalent strings.
       | 
       | It was a headache and a half...
        
         | mseepgood wrote:
         | A Ms True also broke Apple's iCloud:
         | https://twitter.com/RachelTrue/status/1365461618977476610
        
           | grenoire wrote:
           | That looks like an interesting hard-coded check, I wonder
           | what it intended to fix.
        
             | fanf2 wrote:
             | There's some analysis in this twitter thread:
             | https://twitter.com/badedgecases/status/1368362392573317120
             | 
             | tl;dr: there are a bunch of fields of various types that
             | arrive as strings, and they get coerced but without paying
             | attention to which field should have which type
        
         | grenoire wrote:
         | Verbatim from the docs, on read-csv:
         | na_valuesscalar, str, list-like, or dict, default None
         | Additional strings to recognize as NA/NaN. If dict passed,
         | specific per-column NA values. By default the following values
         | are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA',
         | '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN',
         | '<NA>', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'.
         | 
         | You fix it by using `keep_default_na=False`, by the way.
        
       | dragonwriter wrote:
       | its weird that this is a 2019 article misrepresenting behavior in
       | the YAML 1.1 spec (2005) most of which reverted in the YAML 1.2
       | spec (2009) as being part of a nonexistent YAML 2.0 spec and
       | justifying a library that purports to handle "YAML" ignoring the
       | spec.
        
         | atombender wrote:
         | You're right, but it's worth noting that much of the world is
         | still on YAML 1.1, for whatever reason, so _in practice_ ,
         | these are actual problems that will be encountered in the real
         | world.
         | 
         | For example, Ruby's standard library only supports YAML 1.1. It
         | relies on libyaml, which is not yet compliant with 1.2.
         | Meanwhile, Python's popular PyYAML library only supports 1.1,
         | and asks users to migrate to a newer fork called ruamel.yaml
         | for 1.2 support.
        
           | dragonwriter wrote:
           | > You're right, but it's worth noting that much of the world
           | is still on YAML 1.1
           | 
           | This is an article justifying use of (and justifying design
           | decisions of) a particular Python quasi-YAML parsing library.
           | If you are in a position to select a non-YAML-1.1-compliant
           | parsing library for Python, or to take the articles advice on
           | design of a YAML(-ish) parsing library, you are, necessarily,
           | _not_ stuck with YAML 1.1.
           | 
           | > for whatever reason
           | 
           | Articles like this spreading misinformation about the current
           | state of standard YAML are part of the reason. LibYAML
           | lagging support is another since so much of the ecosystem
           | depends on libYAML (though, while the documentation situation
           | is terrible, it looks like maybe libYAML has some level of
           | 1.2 support since 0.23.)
           | 
           | > For example, Ruby's standard library only supports YAML
           | 1.1. It relies on libyaml, [...] Python's popular PyYAML
           | library only supports 1.1
           | 
           | Which, also, is dependent on libYAML.
           | 
           | > and asks users to migrate to a newer fork called
           | ruamel.yaml for 1.2 support.
           | 
           | Which makes a lot more sense than migrating to a library thar
           | supports neither 1.1 nor 1.2, but a nonstandard variant that
           | addresses some of the same issues resolved years ago in 1.2,
           | especially when a library supporting 1.2 is available for the
           | same language.
        
       | namelosw wrote:
       | I prefer JSON over YAML because I spend more time confused and
       | burned by the problems caused by it.
       | 
       | I understand that people don't like directly use JSON because
       | it's not very friendly: no comments, no multi-line string, etc.
       | 
       | A great alternative IMHO is cson[0]. It's like JSON to JavaScript
       | but for CoffeeScript (though nobody talks about it nowadays). It
       | has indentation-based syntax, comments, and multiline string
       | which usually don't need to escape. The advantage is it's close
       | enough to JSON which is the canonical format that everybody can
       | agree on nowadays. For YAML and TOML there are too many visual
       | part-aways from JSON.
       | 
       | Or just create a JSON variant that enables comments and the
       | backtick multiline string from JavaScript.
       | 
       | [0] https://github.com/bevry/cson
        
       | jmartrican wrote:
       | It seems like we need to treat yaml like json and quote all
       | strings. Would that help resolve these issues? Just trying to
       | figure out a rule I can implement to prevent these issues.
        
       | WalterBright wrote:
       | > The most tragic aspect of this bug, howevere, is that it is
       | intended behavior according to the YAML 2.0 specification.
       | 
       | This is one of those great ideas that sadly one needs experience
       | to realize are really bad ideas. Every new generation of
       | programmers has to relearn it.
       | 
       | Other bad ideas that resurface constantly:
       | 
       | 1. implicit declaration of variables
       | 
       | 2. don't really need a ; as a statement terminator
       | 
       | 3. assert should not abort because one can recover from assert
       | failures
        
         | atleta wrote:
         | I agree with the general observation, but the need for ";" ?
         | Quite a few languages (over a few generations) have been doing
         | fine without the semicolon. Just to mention two: python and
         | haskell. (Yes, python has the semicolon but you'll only ever
         | use it to put multiple statements on a single line.)
        
           | Cu3PO42 wrote:
           | Haskell has the semicolon for the same reason!
        
           | lelanthran wrote:
           | > I agree with the general observation, but the need for ";"
           | ? Quite a few languages (over a few generations) have been
           | doing fine without the semicolon. Just to mention two: python
           | and haskell. (Yes, python has the semicolon but you'll only
           | ever use it to put multiple statements on a single line.)
           | 
           | But then it's inconsistent and has unnecessary complexity
           | because now there's one (or more) exceptions to the rules to
           | remember: when the ';' is needed. And of course if you get it
           | wrong you'll only discover it at runtime.
           | 
           | "Consistent applications of a general rule" is preferable to
           | "An easier general rule but with exceptions to the rule".
        
             | labawi wrote:
             | When you use ; and possibly {, }, code statements / blocks
             | are specified redundantly (indentation + separators), which
             | can cause inconsistent interpretation of code by compiler /
             | readers.
             | 
             | I find it much, much easier to look at code and parse
             | blocks via indentation, than the many ways and exceptions
             | of writing ; and {, }, while an extra or missing ';' or {}
             | easily remains unspotted and leads to silly CVEs.
        
             | nightcracker wrote:
             | Have you ever used Python? If you did you really wouldn't
             | be saying this. There isn't an exception. The semicolon is
             | used to put multiple statements on a single line. That's
             | it's only use, and that's the only time it's 'needed' - no
             | exceptions.
        
               | lelanthran wrote:
               | > Have you ever used Python? If you did you really
               | wouldn't be saying this. There isn't an exception.
               | 
               | For the ';', perhaps not. For the token that is used to
               | terminate (or separate) statements? Yes, the ';' is an
               | exception to the general rule of how to terminate
               | statements.
               | 
               | The semicolon also works on some sort of statements and
               | not others, throwing errors only at runtime.
               | 
               | It's easier to remember one rule than many.
        
               | pedrovhb wrote:
               | Honestly, the rule is "don't use semicolons in Python". I
               | don't think there's a single one in the large codebase I
               | work with, and there's really no reason at all to use it
               | other than maybe playing code golf.
               | 
               | It's not a language in which you ever need be saving
               | bytes on the source code. Just use a new line and indent.
               | It's more readable and easier.
        
               | c-cube wrote:
               | But python has instead the "insert \ sometimes" rule,
               | which isn't better.
        
             | atleta wrote:
             | There are no exceptions. You only need it if/when you want
             | to put multiple statements on a single line. That's its
             | sole purpose.
             | 
             | And I'd also add that it's something that you almost never
             | do. One practical use is writing single line scripts that
             | you pass to the interpreter on the command line. E.g.
             | `python -c 'print("first command"); print("second
             | command")'`
             | 
             | If you don't know about the `;` at all in python then you
             | are 100% fine.
        
           | ufo wrote:
           | Another inreresting example is Lua. It's a free form language
           | without semicolons. It's not indentation sensitive.
        
             | samatman wrote:
             | Lua does have semicolons!
             | 
             | It even has semicolon insertion, but because the language
             | is carefully designed, this doesn't cause problems, and
             | most users can go a lifetime without knowing about it.
             | 
             | Our coding style requires semicolons for uninitialized
             | variables, so you'll see
             | 
             | ``` local x; if flag then x = 12 else x = 24 end ```
             | 
             | As a way of marking that the lack of initialization is
             | deliberate. `local x = nil` is used only if x might remain
             | nil.
        
           | yakshaving_jgt wrote:
           | > Yes, python has the semicolon but you'll only ever use it
           | to put multiple statements on a single line.
           | 
           | This is also true of Haskell btw.
        
         | asiachick wrote:
         | What do think of implicit member access (C++, Java, C#) vs
         | explicit (python, javascript)? Is there a concrete argument one
         | way or the other?
         | 
         | I feel like I prefer explicit                   self.member =
         | value         this.member = value
         | 
         | vs implicit                   member = value
         | 
         | But clearly C++/Java/C# people are happy with implicit ...
         | though many of them try to make it explicit by using a naming
         | convention.
        
           | mcv wrote:
           | The fact that people introduce naming conventions to keep
           | track of member variables is probably the biggest
           | condemnation of implicit member access. People clearly need
           | to know this, so you'd better make it explicit.
           | 
           | It's actually a bit surprising that this is one thing that
           | javascript does better than Java. In most other areas, it's
           | Java that's (sometimes overly) explicit.
        
           | coopierez wrote:
           | That was my single biggest pet-peeve of C++. A variable
           | appears in the middle of a member function? Good luck
           | figuring out what owns it. Is it local? Owned by the class?
           | The super-class? (And in that case - which one?)
           | 
           | The added mental load of tracking variables' sources builds
           | up.
        
             | logicchains wrote:
             | FWIF, most C++ style guards recommend writing member
             | variables like mVariableName or variable_name_ so they're
             | easy to distinguish from local variables, and modern C++
             | doesn't generally make much use of inheritance so there's
             | usually only one class it could belong to.
        
           | aasasd wrote:
           | I can tell for certain that as a JS/Python man, every time I
           | look through Java code I have to spend a bit of time when
           | stumbling upon such access, until I remember that it's a
           | thing in Java. Pity that Kotlin apparently inherited it.
           | 
           | But at least, to my knowledge, in Java these things can't
           | turn out to be global vars. Having this 'feature' in JS or
           | Python would be quite a pain in the butt.
        
         | linspace wrote:
         | > implicit declaration of variables
         | 
         | This is so true. I really like Julia and I know that explicitly
         | declaring variables would be detrimental to adoption but I
         | prefer it to the alternative, which is this:
         | https://docs.julialang.org/en/v1/manual/variables-and-scopin...
        
         | goatinaboat wrote:
         | _This is one of those great ideas that sadly one needs
         | experience to realize are really bad ideas. Every new
         | generation of programmers has to relearn it._
         | 
         | It's a bad idea because ASCII already includes dedicated
         | characters for field separator, record separator and so on.
         | These could easily be made displayable in a text editor if you
         | wanted just as you can display newlines as |. Anyone who
         | invents a format that involves using normal printable
         | characters as delimiters and escaping them when you need them,
         | is, I feel very confident in saying, grotesquely and
         | malevolently incompetent and should be barred from writing
         | software for life. CSV, JSON, XML, YAML, all guilty.
        
           | spion wrote:
           | how do you write them though
        
             | teddyh wrote:
             | Ctrl-\, Ctrl-], Ctrl-^ and Ctrl-_ for file, group, record
             | and unit separator, respectively.
             | 
             | However, your tty driver, terminal or program are all
             | likely to eat them or munge them. Also, virtually nothing
             | actually uses these characters for these purposes.
        
               | goatinaboat wrote:
               | _virtually nothing actually uses these characters for
               | these purposes._
               | 
               | Right. Which is why we have all these hilarious escaping
               | and interpolation problems. Any why programmers will
               | never be taken seriously by real engineers. It's like we
               | have cement mixed and ready to go but we decide to go and
               | forage for mud instead and think that makes us cleverer
               | than the cement guys.
        
           | tjalfi wrote:
           | > It's a bad idea because ASCII already includes dedicated
           | characters for field separator, record separator and so on.
           | 
           | ASCII is over 60 years old and separators haven't caught on
           | yet; what's different now?
           | 
           | > These could easily be made displayable in a text editor if
           | you wanted just as you can display newlines as |.
           | 
           | Can you name a common text editor with support for ASCII
           | separators? It's a lot easier to use delimiters and escaping
           | then change every text editor in the world.
           | 
           | > Anyone who invents a format that involves using normal
           | printable characters as delimiters and escaping them when you
           | need them, is, I feel very confident in saying, grotesquely
           | and malevolently incompetent and should be barred from
           | writing software for life. CSV, JSON, XML, YAML, all guilty.
           | 
           | All of the formats you rant about are widely used, well
           | supported, and easy to edit with a text editor - none of
           | these are true of ASCII separators. People chose formats they
           | can edit today instead of formats they might be able to edit
           | in the future. All of these formats have some issues but none
           | of the designers were incompetent.
        
           | erik_seaberg wrote:
           | US-ASCII only has four information separators, and I believe
           | they can only be used in a four-layer schema with no
           | recursion, sort of like CSV (if your keyboard didn't have a
           | comma or return key). When you need to pass an object with
           | records of fields _as a field_ you're out of luck, and
           | everyone has to agree on encoding or escaping them again.
           | 
           | I think SGML (roll your own delimiters and nesting) was
           | pretty close to the Right Thing(tm), but ISO has the specs
           | locked down so everyone had a second-hand understanding of
           | it.
        
           | aasasd wrote:
           | The obvious first step toward the brighter future is to
           | refrain from using any and all software that utilizes the
           | malevolent formats you mentioned. Doing otherwise would mean
           | simply being untrue to one's own conscience and word.
        
         | lixtra wrote:
         | I'm surprised that with your experience you come to such
         | unbalanced conclusions. Everything in engineering is about
         | trade-offs and while your conclusions may be indisputable for
         | the design goals of D they may wrong in other contexts.
         | 
         | 1. If I scribble some one time code etc. the probability of
         | having an error coming from implicit declarations is in the
         | same order of magnitude as missing out edge cases or not
         | getting the algorithm right for most people. The extra
         | convenience may well be worth it.
         | 
         | 2. I would relax this it should be clear to the programmer
         | where a statement ends.
         | 
         | 3. Go on with a warning is a sane strategy in some situations.
         | I happily ruin my car engine to drive out of the dessert. The
         | assert might have been to strict and i know something about the
         | data so the program can ignore the assert failure.
        
           | WalterBright wrote:
           | Your rationale in this and your followups are exactly what
           | I'm talking about.
           | 
           | 1. You're actually right if the entire program is less than
           | about 20 lines. But bad programs always grow, and implicit
           | declaration will inevitably lead you to have a bug which is
           | really hard to find.
           | 
           | 2. The trouble comes from programmer typos that turn out to
           | be real syntax, so the compiler doesn't complain, and people
           | tend to be blind to such mistakes so don't see it. My
           | favorite actual real life C example:                   for (i
           | = 0; i < 10; ++i);         {             do_something();
           | }
           | 
           | My friend who coded this is an excellent, experienced
           | programmer. He lost a day trying to debug this, and came to
           | me sure it was a compiler bug. I pointed to the spurious ;
           | and he just laughed.
           | 
           | (I incorporated this lesson into D's design, spurious ;
           | produce a compiler error.)
           | 
           | 3. I used to work for Boeing on flight critical systems, so I
           | speak about how these things are really designed. Critical
           | systems always have a backup. An assert fail means the system
           | is in an unknown, unanticipated state, and cannot be relied
           | on. It is shut down and the backup is engaged. The proof of
           | this working is how incredibly safe air travel is.
        
             | lixtra wrote:
             | > 3. I used to work for Boeing on flight critical systems,
             | so I speak about how these things are really designed.
             | Critical systems always have a backup. An assert fail means
             | the system is in an unknown, unanticipated state, and
             | cannot be relied on. It is shut down and the backup is
             | engaged.
             | 
             | I ask you to reconsider your assumptions. How did this play
             | out in the 737 MAX crashes? Was there a backup AoA sensor?
             | Did MCAS properly shut down and backup engaged? Was manual
             | overriding the system not vital knowledge to the crew?
             | 
             | You don't have to answer. I probably wouldn't get it
             | anyway.
             | 
             | But rest assured that I won't try to program flight control
             | and I strongly appreciate your strive for better software.
        
               | WalterBright wrote:
               | > How did this play out in the 737 MAX crashes?
               | 
               | They didn't follow the rule in the MCAS design that a
               | single point of failure cannot lead to a crash.
               | 
               | > Was manual overriding the system not vital knowledge to
               | the crew?
               | 
               | It was, and if the crew followed the procedure they
               | wouldn't have crashed.
        
           | unionpivo wrote:
           | I disagree with most of what you said but I want to
           | specifically call out:
           | 
           | > 3. Go on with a warning is a sane strategy in some
           | situations.
           | 
           | No, if its sometimes ok, to continue, than you should not
           | assert it.
           | 
           | Assert means "I assert this will always be true, and if it's
           | not our runtime is in unknown/bad state."
           | 
           | If you think you can recover, or partially recover,
           | throw/return appropriate error, and go into
           | emergency/recovery mode.
        
             | lixtra wrote:
             | Your reactor is boiling. Your control software shut down
             | with assertion failed: temperature too high, cannot display
             | more than 3 digits.
             | 
             | Downvote me if you want to open a bug ticket with the
             | vendor and wait a week for the fix.
             | 
             | Upvote me if you'd give it a try to restart with a switch
             | to ignore assertions.
             | 
             | You may abstain if you never shipped a bug.
             | 
             | Edit: not to forget that this website runs on lisp which
             | violates all three. Was it really a bad choice for the
             | website?
        
               | unionpivo wrote:
               | > Your reactor is boiling. Your control software shut
               | down with assertion failed: temperature too high, cannot
               | display more than 3 digits.
               | 
               | Several points:
               | 
               | 1. Most of such critical components have several
               | different and independent implementations, with analog
               | backup (if possible).
               | 
               | 2. You are arguing one specific safety critical case,
               | that 99.999% or even more programmers will never face,
               | should somehow inform decision about general purpose
               | programming language.
               | 
               | 3. Even if you are working in such safety critical
               | situation, you should not really on assertion bypass, but
               | have separate emergency procedure, which bypasses all the
               | checks and try's to force the issue. (ever saw a --force
               | flag ?)
               | 
               | Because what happens in reality, is developer encounters
               | a bug (maybe while its still in development), notice you
               | can bypass it by disabling assertions (or they are
               | disabled by default), log it as a low priority bug, that
               | never gets fixed.
               | 
               | Then a decade later me or someone like me is cursing you
               | because you enterprise app just shit the bed, and is
               | generating tons of assertion warnings, even when it
               | running normally, so I have to figure out, which of them
               | are "just normal" program flow, and which one just caused
               | an outage.
               | 
               | I never experienced situation like you described, but I
               | have experienced behavior like I wrote above, too many
               | times.
               | 
               | Botom line is:
               | 
               | - don't assert if you don't mean it
               | 
               | - if you need bypass for various runtime checks, code one
               | in explicitly.
               | 
               | Edit: Hacker News is written in ARC which is schema
               | dialect. ARC doesn't have assertions as far as i can
               | tell.
               | 
               | ARC doesn't have its own runtime and is run on racket
               | language, that has optional assertion, that exit the
               | runtime if they fail https://docs.racket-lang.org/ts-
               | reference/Utilities.html
        
               | jrockway wrote:
               | I agree with this. Nuclear reactors are a special case of
               | systems where removing energy from the system makes it
               | more unsafe, because it generates its own energy and
               | without a control system it will generate so much energy
               | that it destroys itself (and due to the nature of
               | radiation, destroys the surrounding suburbs too).
               | 
               | With most systems, the safest state is off. CNC machine
               | making a weird noise? Smash that e-stop. Computer
               | overheating? Unplug it. With this in mind, "assert"
               | transitions the system from an undefined state to an
               | inoperative state, which is safer.
               | 
               | That isn't to say that that you want bugs in your code,
               | and that energizing some system is free of consequences.
               | Your emergency stop of your mill just scrapped a $10,000
               | part. Unplugging your server made your website go down
               | and you lost a million dollars in revenue. But, it didn't
               | kill someone or burn the building down, so that's nice.
        
               | WalterBright wrote:
               | See my previous reply. Your reactor design is susceptible
               | to a single point of failure, and, how do I say it
               | strongly enough, is an utterly incompetent design.
               | Bypassing assertions is not the answer.
        
           | tralarpa wrote:
           | > 1. If I scribble some one time code
           | 
           | .... and here is another entry for Walter's list of bad
           | ideas:
           | 
           | 4. "It's okay. I will use this code only once"
        
             | erik_seaberg wrote:
             | My favorite Red Green quote is "now, this is only temporary
             | ... unless it works."
        
         | jhomedall wrote:
         | F#, Kotlin, Python, Nim and many others all seem to get by fine
         | without semicolons as statement terminators.
        
           | WalterBright wrote:
           | In Python, a newline is a token and serves as a statement
           | terminator.
           | 
           | What I'm referring to is the notion that:                   a
           | = b c = d;
           | 
           | can be successfully parsed with no ; between b and c. This is
           | true, it can be. But then it makes errors difficult to
           | detect, such as:                   a = b         *p;
           | 
           | Is that one statement or two?
        
       | jansan wrote:
       | Norway is one of the luckiest countries in the world. They have a
       | vast amount of resources, can produce their electrical energy
       | entirely from hydropower, have a great democracy, a government
       | they can trust, a beautiful landscape and great people.
       | 
       | I must say that I feel a little bit of relief to see that they
       | have problems that nobody else has, besides insanely expensive
       | alcohol that is only sold in "wine monopoly" stores that are more
       | heavily guarded than banks.
        
       ___________________________________________________________________
       (page generated 2021-04-03 23:01 UTC)