[HN Gopher] The Norway Problem
___________________________________________________________________
The Norway Problem
Author : dedalus
Score : 576 points
Date : 2021-04-02 13:12 UTC (1 days ago)
(HTM) web link (hitchdev.com)
(TXT) w3m dump (hitchdev.com)
| gitowiec wrote:
| I don't like YAML because when I need to write configuration in
| it I waste time to remember what is the syntax. I have much
| better understanding of JSON, because I use it almost on daily
| basis.
| vanshg wrote:
| Why not just use Python itself for storing configurations? You
| can be explicit about the data type and no need to parse anything
| DennisP wrote:
| Something that used to plague me is that I had database processes
| importing Excel docs from clients, and if the first few rows in a
| column were numbers, SQLServer assumed that all the values must
| be numbers. Then it would run into cells containing other
| strings, and instead of revising its assumption, it would just
| import them as null. Since clients often didn't have great data
| hygiene, it was a problem.
|
| I finally solved it by exporting to csv, and using third-party
| software that handled its own import and did it correctly.
| dpratt71 wrote:
| I don't understand why Haskell gets brought up in the middle of
| an otherwise interesting and useful article. This sort of thing
| cannot happen in Haskell. And while Haskell is not universally
| admired, I can't recall seeing Haskell's flavor of type inference
| being a reason why someone claimed to dislike Haskell.
| sdfhbdf wrote:
| What I am most baffled by with Yaml is the fact that it's a
| superset of JSON.
|
| Whenever an input accepts YAML you can actually pass in JSON
| there and it'll be valid
|
| It really surprised me when I found out and I use JSON Whenever
| possible since then since it's much stricter
|
| https://en.m.wikipedia.org/wiki/JSON#YAML
| dragonwriter wrote:
| > Whenever an input accepts YAML you can actually pass in JSON
| there and it'll be valid
|
| Strictly speaking, this is only true of YAML 1.2, not YAML
| 1.0-1.1 (the article here addresses YAML 1.1 behavior, the
| headline example od which was removed ib YAML 1.2 twelve years
| ago), though it calla YAML 1.1 "YAML 2.0", which doesn't
| actually exists.
|
| Of course, there are lots of features, like custom types, that
| JSON doesn't support, but you can still use YAML's JSON-style
| syntax instead of actual JSON, for them.
| alephu5 wrote:
| Yes this is usually the best way. If you need some features for
| code reuse there are several preprocessors. I personally use
| Dhall to configure everything and then convert it to JSON for
| my application to consume. It is a lot more powerful than YAML
| and has a very safety-oriented type system.
| norrius wrote:
| > Whenever an input accepts YAML you can actually pass in JSON
| there and it'll be valid
|
| ...unless your parser strictly implements YAML 1.1, in which
| case you should be careful to add whitespace around commas (and
| a few other minor things). This is a valid JSON that some YAML
| parsers will have problems with:
| {"foo":"bar","\/":10e1}
|
| The very first result Google gives me for "yaml parser" is
| https://yaml-online-parser.appspot.com, which breaks on the
| backslash-forward slash sequence.
| knorker wrote:
| NO problem.
| lifeisstillgood wrote:
| Hang on. The strict model seems off.
|
| In the first model entering
|
| GB 9.3
|
| gets you a string and a number.
|
| But the second gets you two strings?
|
| _Both_ are wrong in my opinion.
|
| "GB" 9.3
|
| is the correct approach
|
| Explicit beats implicit every time.
| Waterluvian wrote:
| I have never gotten far into a project and thought, "my config
| files are too verbose. I wish there were clever shorthands."
|
| Does Yaml have any sort of strict mode?
|
| I imagine I could find a linter that disallows implicit strings.
| exyi wrote:
| Not YAML by itself, but there are libraries that parse a YAML-
| like format that is typed. For example this one:
| https://hitchdev.com/strictyaml/. Technically, it is not
| compatible with the YAML spec.
| yakshaving_jgt wrote:
| > it's equally true that extremely strict type systems require a
| lot more upfront and the law of diminishing returns applies to
| type strictness - a cogent answer to the question "why is so
| little software written in haskell?"
|
| I was with the article up until that point. I don't agree that
| diminishing returns with regards to type strictness applies
| linearly. Term-level Haskell is not massively harder than writing
| most equivalent code in JavaScript -- in fact I'd say it's easier
| and you reap greater benefit. Perhaps it's a different story when
| you go all-in on type-level programming, but I'm not sure that's
| what the author was getting at. This smells of the _Middle
| Ground_ logical fallacy to me. Or of course the comment was
| tongue-in-cheek and I 'm overreacting.
| choeger wrote:
| That law of diminishing returns might actually apply, I am not
| 100% sure. But more powerful type systems allow for the more
| complex composition of more complex interfaces in a safe
| manner. Think of higher-level modules and data structures. Or
| dependent types and input handling. Or linear types and
| resource handling.
| samvher wrote:
| I agree. I would say that Erlang goes ~80% of the way compared
| to Haskell's type system and the last 20% really matter, to the
| point that in many cases I find myself not really using
| Erlang's (optional) type system at all. Better type coverage
| and more descriptive types allow the compiler to infer more and
| I'd say this is the opposite of diminishing returns.
| 7952 wrote:
| I had to rewrite some JavaScript code in Postgres recently that
| measured the overlap between different elevation ranges. In JS
| I had to write it myself and deal with the edge cases and bugs.
| In Postgres I just use the range type and some operators. It
| was brilliant in comparison. The tiny effort of learning it was
| worth it. The list of data types I use all the time is bigger
| than just string, numbers and booleans. Serialisation formats
| should support them. Particularly as there are often text
| format standards that already exist for a lot of them. Give me
| wkt geometry and iso formatted dates. It's not that difficult
| and totally with it.
| kstenerud wrote:
| The worst tragedy of this is the security implications of subtly
| different parsers. As your application surface increases, you're
| likely to mix languages (and thus different parsers), which means
| that the same input data will produce different output data
| depending on whether your parser replaces, truncates, ignores, or
| otherwise attempts to automatically "fix up" the data. A
| carefully crafted document could exploit this to trick your data
| storage layer into storing truncated data that elevates
| privileges or sets zero cost, while your access control layer
| that ignores or replaces the data is perfectly happy to let the
| bad document pass by.
|
| And here's something else to keep you up at night: Just think of
| how many unintentional land mines lurk in your serialized data,
| waiting to blow up spectacularly (or even worse, silently) as
| soon as you attempt to change implementation technologies!
|
| This is why I've been so anal about consistent decoder behavior
| in Concise Encoding https://github.com/kstenerud/concise-
| encoding/blob/master/ce...
|
| https://concise-encoding.org/
| yellowapple wrote:
| This is exactly why configuration/serialization formats should
| make as few assumptions about value types as possible. Once
| parsing's done, everything should be a string (or possibly a
| symbol/atom, if the program ingesting such a file supports
| those), and it should be up to the application to convert values
| to the types it expects. This is Tcl's approach, and it's about
| as sensible as it gets.
|
| ...which is why it pains me to admit that in my own project for a
| Tcl-like scripting/config language[1] I missed the float v.
| string issue, so it'll currently "cleverly" return different
| types for 1.2 (float) v. 1.2.3 (atom). Coincidentally, I started
| work on a "stringy" alternative interpreter that hews closer to
| Tcl's philosophy (to fix a separate issue - namely, to avoid
| dynamically generating atoms, and therefore avoid crashing the
| Erlang VM when given potentially-adversarial input), so I'm gonna
| fix that case for at least the "stringy" mode (by emitting
| strings instead of numbers, too), knocking out two birds with one
| stone for the upcoming 0.3.0 release :)
|
| ----
|
| [1]: https://otpcl.github.io, for those curious
| progval wrote:
| > Once parsing's done, everything should be a string
|
| Or give a schema to the parser, defining what type is expected
| in each field.
| kenshoen wrote:
| Yes, that looks like a right way to handle this problem
| without ignoring YAML spec. Define what to parse upfront.
| dkersten wrote:
| It's reasons like this that I want my configuration languages
| to be explicit and unambiguous. This is why I use JSON or if I
| want a human friendly format, TOML. Strings are always "quoted"
| and numbers are always unquoted 1.2, it can never accidentally
| parse one as the other. The convenience of omitting quotes is
| just not worth the potential for ambiguity or edge cases to me.
| [deleted]
| eitland wrote:
| There exists a couple of mainstream languages that are full of
| these sorts of _interesting_ behavior, one of them is supposedly
| cool and productive and the other is supposedly ugly and evil.
| brohee wrote:
| The "Wat?" Talk got quite a few example and is hilarious.
|
| https://www.destroyallsoftware.com/talks/wat
| drno123 wrote:
| Python vs JavaScript?
| speedgoose wrote:
| Python vs PHP also.
| pansa2 wrote:
| > _full of these sorts of interesting behavior_
|
| I don't think that applies to Python - it's quite strongly
| (although not statically) typed. I agree that it does apply
| to JavaScript and PHP.
| eitland wrote:
| Javascript and PHP is correct.
| exyi wrote:
| I think this applies to Python pretty well. Although
| certainly not as bad as PHP, most JS traps also exist in
| Python (falsy values, optional glitchy semicolons,
| function scoped variables, mutable closure). There is
| many JS specific traps like this and also other Python
| specific ones (like static fields are also instance
| fields, Python versions and library dependency hell).
| However I find it easier to avoid them in JS than in
| Python with TypeScript, avoiding classes, ...
| lunfard00 wrote:
| and yet I don't see anyone complain about bash which is
| arguably far worse than those 2. When things get hard on bash,
| you will start to see python scripts on CI and whole thing is
| complete unreadable mess
| masklinn wrote:
| > I don't see anyone complain about bash
|
| You're not looking really hard then, but really
|
| > When things get hard on bash, you will start to see python
| scripts
|
| That's kinda the thing innit? Unless the system specifically
| only allows shell scripts (something I don't think I've ever
| encountered though I'm sure it exists) it's quite easy to
| just use something else when bash sucks, so while people will
| absolutely complain about it they also have an escape: don't
| use bash.
|
| When a piece of software uses YAML for its configuration
| though, you don't really have such an option.
|
| Furthermore, bash being a relatively old technology people
| know to avoid it, or what the most common pitfalls are.
| Though they'll still fall into these pitfalls regularly.
| lunfard00 wrote:
| There is a lot of elitism around bash, like the "Arch btw"
| thing but far worse because a lot of important things
| depends on it.
|
| Powershell has been working on linux for quite a while now
| and doesnt seem get any attention even when it has a nice
| IDE support and copy the good things about bash.
| lokedhs wrote:
| It doesn't copy all the good things about the Unix shell
| though.
|
| The reason people are comfortable with the POSIX shell is
| because you use the same syntax for typing commands
| manually as you do for scripts. But, you're going to have
| a hard time finding people who prefers writing:
| Remove-Item some/directory -recursive
|
| Rather than rm -fr some/directory
|
| People who write shellscripts are often not seeing
| themselves writing a "program". They are just automating
| things they would do manually. Going to an IDE in this
| case is not something you'd consider.
|
| I happen to be very aware of all the pitfalls in POSIX
| shell, and it's rare that I see a shellscript where I
| cannot immediately point out multiple potential problems,
| and I definitely agree that most scripts should probably
| be written in a language that doesn't contain so many
| guns aimed at the user's feet. I'm just pointing out a
| likely reason why people are not adopting powershell in
| the huge numbers that Microsoft may have hoped for.
| majkinetor wrote:
| Nonsence. This is the same in powershell:
| rm -r -f some/directory
| disgruntledphd2 wrote:
| Bash is a total disaster, I complain about it all the time.
| Unfortunately, rather like JS, it's unavoidable.
| eitland wrote:
| I'd not consider bash a
|
| 1. mainstream
|
| 2. programming language
|
| (of course _technically_ it is a programming language, but it
| is also more precisely a scripting language)
| earthboundkid wrote:
| It's fashionable to hate XML because it was used in a lot of
| places it was a bad fit in the 00s, but at least it's a pretty
| good document language.
|
| YAML though is always a bad fit. If you want machine readable
| config, use JSON; human readable, use TOML. When does YAML ever
| fit?
| suttree wrote:
| Reminds me that the reasoning behind austerity came from an Excel
| calculation that didn't include all the relevant rows :~/
|
| https://www.theguardian.com/politics/2013/apr/18/uncovered-e...
|
| https://www.bbc.co.uk/news/magazine-22223190
|
| https://theconversation.com/the-reinhart-rogoff-error-or-how...
| blunte wrote:
| If you want no misunderstandings, be explicit. This applies to
| YAML and life in general. There's an annoying but fairly accurate
| saying about assumptions that applies.
|
| If you want something to be a specific type, you better have an
| explicit way of indicating that. If you say quotes will always
| indicate a string, great. Of course we know it's not that simple,
| since there are character sets to consider.
|
| The safest answer is to do something like XML with DTDs. But that
| imposes a LOT of overhead. Naturally we hate that, so we make
| some "convention over configuration" choices. But eventually, we
| hit a point where the invisible magic bites us.
|
| This is one case where tests would catch the problem, if those
| tests are thorough enough - explicitly testing every possibility
| or better yet, generative testing.
| korijn wrote:
| Or just opening your browser and trying out norwegian on a QA
| environment.
| thrower123 wrote:
| I've never seen anything that used YAML that I didn't want to
| douse with gasoline, nuke from orbit, and then salt the ground
| where it once stood.
|
| I cry and rage and rend my clothes when I stumble upon some new
| thing that makes me have to use it.
| dangoor wrote:
| Cue also solves this problem. The "no" example is right on the
| front page: https://cuelang.org
|
| I used it for configuration of a Go program recently and found it
| pleasant to work with. I hope the language is declared stable
| soon, because it's a good model.
| [deleted]
| mcv wrote:
| If it ignores part of the spec, I don't think "strictyaml" is the
| correct name here. Instead, if it interprets everything as
| string, perhaps "stringyaml" would have been more accurate,
| though I'm sure that's not as good PR.
|
| I'm reminded of the discussion we had a few days ago about
| environment variables; one problem there is that env variables
| are always strings, and sometimes you do want different types in
| your config. But clearly having the system automatically
| interpret whether it's a string or something else is a major
| source of bugs. Maybe having an explicit definition of which
| field should be which type would help, but then you end up with
| the heavy-handed XML with its XSD schema.
|
| Or you just use JSON, which is light-weight, easy to read, but
| unambiguous about its types. I guess there's a good reason it's
| so popular.
|
| Maybe other systems like yaml and environment variables should
| only ever be used for strings, and not for anything else, and I
| suppose replacing regular yaml with 'strictyaml' could play a
| role there. Or cause unending confusion, because it does violate
| the spec.
| povik wrote:
| "saneyaml" would not make for bad PR
| marcinzm wrote:
| >If it ignores part of the spec, I don't think "strictyaml" is
| the correct name here.
|
| The article didn't fully explain it but strictyaml requires a
| typed schema or defaults to string (or list or dict) if one is
| not provided. So it strictly follows the provided schema.
| mcv wrote:
| That makes a big difference indeed. It wasn't clear to me
| from the article, but string yaml + optional schema sounds
| like a useful combination.
| msiemens wrote:
| > JSON, which is [...] unambiguous about its types
|
| With the one exception that with floatig point values the
| precision is not specified in the JSON spec and thus is
| implementation defined[1] which may lead to its own issues and
| corner cases. It for sure is better than YAML's 'NO' problem,
| but depending on your needs JSON may have issues as well
|
| [1]: https://stackoverflow.com/questions/35709595/why-would-
| you-u...
| wongarsu wrote:
| Also JSON's complete lack of many commonly used types, and no
| way to define any new ones.
| mcv wrote:
| Isn't that a problem with most of these config languages,
| though? XML is the only one where I think this might be
| possible.
| wongarsu wrote:
| Allowing you to define types is quite uncommon, but many
| config languages allow more types than JSON (so more than
| boolean, number, string, list, dict). Date datatypes are
| a big one and are provided by about every second JSON
| variant, in addition to TOML, ION and others.
| lenkite wrote:
| Why does YAML accept unquoted strings ? Be Strict. Be Safe.
| Hendrikto wrote:
| > The real fix requires explicitly disregaring the spec
|
| Or... just quote your strings.
| dragonwriter wrote:
| Or, "use an appropriate schema". Or, for several of the
| specific problems identified in the source article, use YAML
| 1.2 (2009) instead of YAML 1.1 (2005), which the article
| misidentifies as "YAML 2.0" and acts as if it is the current
| spec.
| andrewclunn wrote:
| oh no, we want this value to be parsed as a string, so we need to
| put quotes around it. the humanity!
| jmull wrote:
| > "While the website went down and we were losing money we chased
| down a number of loose ends until finally finding the root
| cause."
|
| Hopefully not a real story. If you're trying out new
| configurations in production and have no mechanism to rollback
| problematic changes, you've got bigger problems than YAML.
|
| To me, though, YAML, including "StrictYAML" doesn't solve any
| problems JSON, perhaps w/comments, already solves.
| RcouF1uZ4gsC wrote:
| YAML seems like a really neat idea, but over time, I have I have
| come to regard it as being too complicated for me to use for
| configuration.
|
| My personal favorite is TOML, but I would even prefer plain JSON
| over YAML
|
| The last thing I want at 2 AM when trying to look figure out if
| an outage is due to a configuration change is having to think if
| each line of my configuration is doing the thing I want.
|
| YAML prizes making data look nicely formatted over simplicity or
| precision. That for me, is not a tradeoff, I am willing to make.
| Arnavion wrote:
| They all have their downsides.
|
| JSON:
|
| - no comments, unless you fake them with fake properties,
| unless your configuration has a schema that doesn't allow extra
| fake properties
|
| - no trailing commas; makes editing more annoying
|
| - no raw strings
|
| YAML:
|
| - the automatic type coercion
|
| - the many ways to encode strings ( https://yaml-
| multiline.info/ )
|
| - the roulette wheel of whether _this_ particular parser is
| anal about two-space indentation or accepts anything as long as
| it 's used consistently
|
| - the roulette wheel of whether _this_ particular parser
| supports uncommon features like anchors
|
| TOML:
|
| - runtime footguns in automated serialization (
| https://news.ycombinator.com/item?id=24853386 )
|
| - hard to represent deeply-nested structures, unless you switch
| to inline tables which are like JSON but just different enough
| to be annoying
| anticristi wrote:
| This makes me sad. It's 2021 and we still haven't figure out
| how to serialize configuration in a format that is easy-to-
| edit and predictable.
| kstenerud wrote:
| This is the problem space I'm targeting with
| https://concise-encoding.org/
|
| * Text AND binary so that humans can edit easily, and
| machines can transmit energy and bandwidth efficiently.
|
| * Carefully designed spec to avoid ambiguities (and their
| security implications).
|
| * Strong type support so you're not using all kinds of
| incompatible hacks to serialize your data.
|
| * Versioned, because there's no such thing as the perfect
| format.
|
| * Also, the website is 32k bytes ;-)
| yyyk wrote:
| + Has binary format.
|
| + Avoids ambiguities.
|
| - The format seems to feel the need to support
| _everything_ , including things I am not sure are actual
| usecases (what's the point of Markup element for example?
| What does Metadata save us compared to just including it
| in document, given that parsers must parse it anyway?).
| This must make implementation most complex and costly,
| and makes reading the text format more difficult.
|
| - Not a fan of octal notation. At 3am not sure I can't
| confuse 0 and o given certain fonts. Does anyone even use
| it these days?
|
| - Unquoted string were discussed in the thread, I'd like
| to point out that it's very easy to make an unquoted
| string not "text-safe" (according to the spec) without
| noticing it, at which point document is invalid.
|
| Just add white-space (maybe a user pasted a string from
| somewhere without noticing whitespace at the end or
| forgot the rules), a dot, an exclamation or a question
| mark. Having surprises like that is IMHO worse than a
| consistent quoting method.
|
| Basically all the things I don't like are about the
| format supporting a bit too much. YAML 1.1 should teach
| us more is sometimes less.
| kstenerud wrote:
| Alright that's two votes against unquoted strings so far
| (plus my wife agrees so that's three against!)
|
| I put in octal because it was trivial to implement after
| the others. The canonical format when it's stored or
| being sent is binary, and a decoder shouldn't be
| presenting integers in octal (that would just be weird).
| But a human might want octal when inputting data that
| will be converted to the binary format.
|
| Markup is for presentation data, UI layouts, etc, but
| with full type support rather than all the hacky
| XML+whatever solutions that many UI toolkits are
| adopting. Also, having presentation data in binary form
| is nice to have.
| yyyk wrote:
| Well, unquoted strings work when a format is built for
| that. If the default was "it's text unless we see the
| special sequences" it would be better for unquoted
| strings. But even then there are too many special
| characters in this format IMHO.
|
| I saw there's a 'Media' type in the spec. It's seems the
| type is actually for serializing files. But there's no
| "name" (or we can call it "description") field. Of course
| we could accomplish this with a separate field - but than
| again the entire type's functionality could be
| accomplished with a u8x array and a string field. So if
| you're specifying this type at all, might as well add a
| name field to make it useful.
| chousuke wrote:
| I'm skimming through the human readable spec, and it
| seems decent, but I noticed the spec allows unquoted
| strings. What's the reasoning for this? In my experience
| unquoted strings cause nothing but trouble, and are
| confusing to humans who may interpret them as keywords.
|
| Any reason for not using RFC2119 keywords in the spec?
| Using them should make the spec easier to read.
| kstenerud wrote:
| > I noticed the spec allows unquoted strings. What's the
| reasoning for this? In my experience unquoted strings
| cause nothing but trouble, and are confusing to humans
| who may interpret them as keywords.
|
| Unquoted strings are much nicer for humans to work with.
| All special keywords and object encodings are prefixed
| with sigils (@, &, $, #, etc), so any bare text starting
| with a letter is either a string or an invalid document,
| and any bare text starting with a numeral is either a
| number or an invalid document.
|
| > Any reason for not using RFC2119 keywords in the spec?
| Using them should make the spec easier to read.
|
| I use a superset of those keywords to give more precision
| in meaning: https://github.com/kstenerud/concise-
| encoding/blob/master/ce...
| chousuke wrote:
| If strings are always unambiquously detectable, why allow
| quoting them at all? Having two representations for the
| same data means you can't normalize a document
| unambiguously. I can understand having barewords seems
| cleaner for things like map keys, but I am not convinced
| that it's a worthwhile tradeoff.
|
| An important feature of RFC2119 keywords is that they're
| always capitalized (ie. the keyword is "MUST", not
| "Must", or "must"). This makes requirements and
| recommendations stand out amid explanatory text,
| improving legibility. For example, RFC2119 itself uses
| MUST and must with different meanings.
| kstenerud wrote:
| > If strings are always unambiquously detectable, why
| allow quoting them at all?
|
| Because strings can contain whitespace and other
| structural characters that would confuse a parser.
|
| > Having two representations for the same data means you
| can't normalize a document unambiguously.
|
| The document will always be normalized unambiguously in
| binary format. The text format is a bit more lenient
| because humans are involved.
|
| The idea is that the binary format is the source of
| truth, and is what is used in 90% of situations. The text
| format is only needed as a conduit for human input, or as
| a human readable representation of the binary data when
| you need to see what's going on.
|
| > An important feature of RFC2119 keywords is that
| they're always capitalized (ie. the keyword is "MUST",
| not "Must", or "must").
|
| Hmm good point. I'll add that.
| anticristi wrote:
| Nice! I like some concepts that this format proposes, but
| the `@` and `|` modifier feels a bit too "loaded".
| kstenerud wrote:
| It's a compromise; there are only so many letters,
| numbers, and symbols available in a single keystroke on
| all keyboards, and I don't want there to be any ambiguity
| with numbers and unquoted strings (e.g. interpreting the
| unquoted string value true as the boolean value true).
|
| So everything else needs some kind of initiator and/or
| container syntax to logically separate it from the other
| objects when interpreted by a human or machine.
| unhammer wrote:
| https://dhall-lang.org/ ?
| trhway wrote:
| XML with a convenient UI tools to edit should have fit the
| bill. Yet, for whatever reason a convenient UI tool would
| never happen to be there when needed, and thus scared and
| tired of manual editing of XML the world have embraced
| YAML.
| masklinn wrote:
| > XML with a convenient UI tools to edit should have fit
| the bill.
|
| "You need this special tool to work" immediately and
| instantly rules out "easy to edit". Or makes the debate
| irrelevant: every format is easy to edit if you have "a
| convenient UI" to do it for you.
| anticristi wrote:
| Opening XMLs in ZIP containers is easy! Just spin up
| Word. :)
| sergeykish wrote:
| The fault was in XML editing, pure data authoring is
| hard. We have convenient UI -- web browser, think of it
| as literate programming, a way to merge man page and
| configuration file.
|
| And plain text editor is a "widely deployed special tool
| to work". Actual data is countries:\n-
| GB\n- IE\n- FR\n- DE\n- NO
|
| Or 636f 756e 7472 6965 733a 0a2d 2047
| 420a 2d20 4945 0a2d 2046 520a 2d20 4445 0a2d
| imhoguy wrote:
| We had such: XML. With proper editor support it is easy. I
| guess it needs rediscovery /s ;)
| anticristi wrote:
| I used XML and didn't like it:
|
| - A proper editor was never around.
|
| - Closing tags were verbose.
|
| - Attributes vs tags was confusing.
|
| - It didn't map "naturally" to common data types, like
| lists, maps, integers, float, etc.
| mattmanser wrote:
| Don't forgot about namespaces, another fiddly bit of XML
| that caused all sorts of problems and headaches.
| sergeykish wrote:
| You've just used XML tech as it was designed to post this
| comment.
|
| XML is serialization. I hardly believe you was concerned
| about serialization while posting comment or thought
| about attributes-tags distinction.
|
| This page utilizes request to server for multi-user
| editing. But it is easy to build truly serverless (like a
| file) document with same interface:
| data:text/html,<html><ul>Host: <span class=host
| contenteditable>example.com
|
| Change it, save it, done. Web handles input of lists,
| maps, integers, float and much more.
| anticristi wrote:
| You are right. XML is great for encoding the DOM.
| However, I didn't find it practical for interfacing with
| humans, due to the concerns I raised.
| sergeykish wrote:
| It is not practical to edit plain text in binary:
| 636f 756e 7472 6965 733a 0a2d 2047 420a 2d20 4945
| 0a2d 2046 520a 2d20 4445 0a2d
|
| It is not practical to edit Excel documents in plain
| text: <?xml version="1.0"?>
| <Workbook xmlns="urn:schemas-microsoft-
| com:office:spreadsheet" xmlns:o="urn:schemas-
| microsoft-com:office:office"
| xmlns:x="urn:schemas-microsoft-com:office:excel"
| xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
| xmlns:html="http://www.w3.org/TR/REC-html40">
| <Worksheet ss:Name="Sheet1"> <Table>
| <Row> <Cell><Data
| ss:Type="String">ID</Data></Cell>
|
| Tim Berners-Lee browser was browser-editor. Can't you see
| parallels?
| tgv wrote:
| > - the automatic type coercion
|
| Only when you "unmarshal" to an untyped data structure and
| then make assumptions about the type. I've used yaml with a
| go application, and it can't interpret NO as a bool when the
| field is a string.
| Arnavion wrote:
| Correct, like TFA.
| perlgeek wrote:
| For hand-writing I love jsonnet, which produces JSON, is much
| more convenient to write, and has some templating, functions
| etc. https://jsonnet.org/
|
| You wouldn't serialize data structures to jsonnet though,
| you'd just generate JSON.
| makkesk8 wrote:
| We too had this problem, we solved it using the 3 letter country
| code instead.
| zomglings wrote:
| Have had a similar issue when adding git revisions to YAML
| documents.
|
| The problem is that if a YAML parser sees a string like this:
|
| "0123e04"
|
| It interprets it as a number: 123 * 10^4
|
| Our hacky solution was to prefix the revision hashes like
| sha-0123e04, but still this was quite annoying.
|
| After that experience, I have stopped using YAML for any of my
| own configuration. Have started preferring putting my
| configurations in code. And when I don't want that, have found
| JSON good enough for my purposes.
| bsenftner wrote:
| This is such a core issue with a tool like YAML, how the hell did
| this program get so popular? Are there that many developers willy
| nilly using tools that fail in critical, silent ways, and the
| horde of no-nothings follows them?
| throwaway4good wrote:
| Just use json.
| dudeinjapan wrote:
| You say Norway, I say Yesway.
| korijn wrote:
| Edit: downvoters, thanks! I realize this is not an easily
| agreeable opinion ("let's all chant 'death to YAML!'") but it's
| really easy to avoid losing money on something like this. Just do
| proper testing.
|
| Aren't you setting yourself up for surprises if you write file
| formats such as TOML and YAML without reading the documentation,
| learning and experimenting first? How about unit testing? Or
| verifying the type in your config parser? Have you tried opening
| your site in the norway config on your development or testing
| environment? Or even in production? It all seems very basic and
| not at all blog post or even HN worthy.
|
| I'm going to assume the authors still haven't learned their
| lesson and are going to experience many more surprises in the
| future working with plain text file formats.
| ancarda wrote:
| You can catch this with yamllint
| (https://github.com/adrienverge/yamllint): %
| cat countries.yml --- countries: - US
| - GB - NO - FR % yamllint
| countries.yml countries.yml 5:4 warning
| truthy value should be one of [false, true] (truthy)
| IHLayman wrote:
| Reminds me of the multiple YAML bugs that have plagued Kubernetes
| such as https://github.com/kubernetes/kubernetes/issues/82296
|
| It is interesting how the standard of any language seems to
| diverge due to just the implementation from different parsers.
| jgalt212 wrote:
| another gotcha:
|
| 2020-03-25 -> datetime.date(2020, 3, 25), not "2020-03-25"
| paxys wrote:
| I will never understand why YAML didn't just require quoted
| strings. Did the creator not anticipate how many problems the
| ambiguity would cause?
| mattmanser wrote:
| Never's a strong word, seems quite easy to understand why to
| me. You've got ease of use reasons, historical reasons like the
| mis-guided Robustness principle, etc.
|
| And these sort of things happen time and time again.
|
| And although officially JSON requires quoted strings, almost
| none of the parsers actually enforce that, and so you will find
| a huge amount of JSON out there that is not actually compliant
| with the official spec.
|
| Just like browsers have huge hacks in them to handle misformed
| HTML.
| Safety1stClyde wrote:
| > And although officially JSON requires quoted strings,
| almost none of the parsers actually enforce that
|
| What programming language? I'm not familiar with those
| parsers, the ones I know of very much do enforce quoted
| strings.
|
| > you will find a huge amount of JSON out there that is not
| actually compliant with the official spec
|
| The parsers I use all follow the current JSON RFC
| specification, and I've never encountered any JSON from APIs
| which they reject.
|
| > Just like browsers have huge hacks in them to handle
| misformed HTML.
|
| Web browsers do deal with a variety of things, not so much
| JSON parsers in my experience.
| Macha wrote:
| I think the point is that they accept more than the spec
| dictates - do your JSON parsers accept e.g. the vs code
| config file (JSON with comments) or JSON with unquoted
| keys?
| joelellis wrote:
| The most commonly used parsers only accept valid JSON -
| including the one included within most JS runtimes
| (JSON.stringify/parse). VSCode explicitly uses a `jsonc`
| parser, the only difference being that it strips comments
| before it parses the JSON. There's also such thing as
| `json5`, which has a few extra features inspired by ES5.
| None of them are unquoted strings. I've never come across
| anything JSON-like with unquoted strings other than YAML,
| and everything not entirely compliant with the spec has a
| different name.
| jokethrowaway wrote:
| Can you name a JSON parser which accept comments or
| unquoted keys?
|
| I've never seen one
| cesarb wrote:
| IIRC, Gson accepts unquoted keys.
| groundCode wrote:
| I've hit this exact same problem loading YAML in Ruby. Luckily
| caught it before it hit prod, but still, it made me go argh for a
| while.
| abujazar wrote:
| Norwegian here. I'd say the problem is YAML, not Norway :D
| dan-robertson wrote:
| Other reasons to not want types happening during parse time:
|
| - "modified" numbers, e.g. $50, 35%, 1.2345568896347853246863477
|
| - Dates. If your language tries to convert a date to Unix time or
| Julian Day, you can have problems with time zones or distant or
| historical dates.
|
| - strings vs symbols. The person writing config shouldn't have to
| care about this distinction.
|
| - Automatic deduplication for fields of objects can be a problem.
| [deleted]
| mrzool wrote:
| Wow, YAML has definitely some pretty quirky edges.
| kleer001 wrote:
| Ah, tool centric (not centered in the person) time-offset (the
| ones making the original mistake doesn't see it) paredolia. Good
| 'el TCTOP...
| DoofusOfDeath wrote:
| Funny coincidence. Around 2000, I worked for a company that
| coined the term "Norway problem" for a different software
| problem.
|
| Their product used an MVCC database (I think ObjectStore). One of
| their customers in Norway had a problem where updates to the
| database seemed to not show up. IIRC the problem was a bug in
| this company's software that caused MVCC to show an older version
| of the database content than expected.
| LeonM wrote:
| I was bitten with this issue some time ago.
|
| The Stripe library has constants for which type of VAT number is
| supplied. One of those constants is 'NO_VAT'...
|
| Needless to say, this caused me some grey hairs
| schoen wrote:
| Recent related HN discussion:
| https://news.ycombinator.com/item?id=26365365
| azernik wrote:
| YAML had a worse example, once.
|
| For the ease of entering time units YAML 1.1 parsed any set of
| two digits, separated by colons, as a number in sexagesimal (base
| 60). So 1:11:00 would parse to the integer 4260, as in 1 hour and
| 11 minutes equals 4260 seconds.
|
| Now try plugging MAC addresses into that parser.
|
| The most annoying part is that the MAC addresses would _only_ be
| mis-parsed if there were no hex digits in the string. Like the
| bug in this post, it could only be reproduced with specific
| values.
|
| Generally, if you're doing implicit typing, you need to keep the
| number of cases as low as possible, and preferably error out in
| case of ambiguity.
| m463 wrote:
| slightly related, on my microwave 99 > 100, even 61 > 100
| roland35 wrote:
| I try to optimize my microwave button pushing too. I also
| have a +30 seconds button, so for 1:30 I can hit
| "1,3,0,Start" or "+30" three times and save a press!
| bonzini wrote:
| Why does your microwave compare numbers?
| sokoloff wrote:
| It doesn't compare them, it just counts down.
|
| If I enter 1-3-0-start, I get 90 seconds of cooking. If I
| enter 9-9-start, I get 99 seconds of cooking, so in that
| sense, 99 > 130.
|
| If I want about 90 seconds, I'll use 88 as it's faster to
| enter (fewer finger movements).
| JoeAltmaier wrote:
| I've done the same thing for decades! Soul mates?
| pdkl95 wrote:
| Vi Hart - "How to Microwave Gracefully"
|
| https://www.youtube.com/watch?v=T9E0zSpULFY
| sokoloff wrote:
| You might like this one as well.
|
| Load soap into the dishwasher _after emptying_ rather
| than after _loading_. If the soap dispenser is closed,
| the dishes are dirty.
| sneak wrote:
| My rule is that loading the dishwasher means that one
| loads all the available dishes, and runs it, even if it's
| only x% full. We use the (large) sink as an input buffer.
|
| If the dishwasher has dishes in it and it's not running,
| they're clean.
| rootusrootus wrote:
| This is exactly our algorithm as all. I can't really
| imagine flipping it the other way, since leaving dirty
| dishes in a dishwasher will just let them completely dry
| out, making it more likely they won't get fully clean
| when the cycle is eventually run.
| medstrom wrote:
| Rinse until visually clean, then put in dishwasher.
| teddyh wrote:
| That's not a zero-copy algorithm. The algorithm with
| using the soap dispenser being closed as a flag _is_
| zero-copy.
| corpMaverick wrote:
| I want to have two dishwashers. One with the dirty dishes
| and one with the clean dishes. So you never have to put
| the dishes away. They go from the clean dishwasher to the
| table to the dirty one. And then flip them.
| sokoloff wrote:
| There's a community near here with a high fraction of
| Orthodox Jews. One condo I toured in my 20s had two
| dishwashers and without thinking about why they did it, I
| commented how I thought that was awesome that you'd never
| need to put dishes away. (They of course installed two
| dishwashers for orthodox separation of dishes from each
| other.)
| tjalfi wrote:
| This idea comes up periodically on Reddit. [0] has a few
| posts from people who have installed them, mostly for
| bachelors.
|
| [0] https://www.reddit.com/r/self/comments/ayr9c/when_im_
| rich_im...
| LilBytes wrote:
| Blasphemy! I do the inverse. You're wrong. /s
|
| _insert code flame war here_
| pdpi wrote:
| Not the OP, but I have the same problem. For some reason
| that escapes me, pressing the "10 sec" button 7 times
| produces 00 70 instead of 01 10. If you then press the "1
| min" button you get 01 70
| nerdponx wrote:
| Most microwaves (in the USA) do this, at least in my
| experience.
|
| They treat the ":" like a sum of two sexagesimal numbers,
| rather than a sexagesimal digit separator.
| mckirk wrote:
| How else would you prove it's turing complete and can run
| Doom?
| whytookay wrote:
| One that really surprised/confused me was that pyaml (and the
| yaml spec) attempts to interpret any 0-prefixed string into an
| octal number.
|
| There was a list of AWS Account IDs that parsed just fine until
| someone added one that started with a 0 and had no numbers
| greater than 7 in it, after which our parser started spitting
| out decidedly different values than we were expecting. Fixing
| it was easy, but figuring out what in the heck was going on
| took some digging.
| adwn wrote:
| > _For the ease of entering time units YAML 1.1 parsed any set
| of two digits, separated by colons, as a number in sexagesimal
| (base 60)._
|
| This is a mind-boggling level of idiocy. Even leaving aside the
| MAC address problem, this conversion treats "11:15" (= 675)
| different from "11:15:00" (= 40500), even though those denote
| the same time, while treating "00:15:00" (15 minutes past
| midnight) and "15:00" (3 in the afternoon) the same.
| dragonwriter wrote:
| > YAML had a worse example, once.
|
| It had it literally at the same time as it had the problem in
| the article (the article refers to YAML 2.O, a nonexistent
| spec, and to PyYAML, a real parser which supports only YAML
| 1.1.)
|
| Both the unquoted-YES/NO-as-boolean and sexagesimal literals
| were removed in YAML 1.2. (As was the 0-prefixed-number-as-
| octal mentioned in a sibling comment.)
| lainga wrote:
| We had a Grafana dashboard where one of the columns was a short
| Git hash. One day, a commit got the hash `89e2520`, which
| Grafana's frontend helpfully decided to display as "+infinity".
| Presumably it was parsing 89E+2520.
| rootusrootus wrote:
| Ha, that reminds me of some work I was doing just yesterday,
| implementing a custom dictionary for a postgres full text
| search index. Postgres has a number of mappings that you can
| specify, and it picks which one based on a guess of what the
| data represents. I got bit by a string token in this same
| format, because it got interpreted as an exponential number.
| ravanave wrote:
| Btw, the reason Haskell isn't used more isn't type system per se,
| as all types can be inferred at the compilation time. People
| would sometimes use this feature even to see if GHCi guesses the
| type correctly (by correctly I mean exactly how the user wants,
| technically it's correct always) first time and save them some
| time writing it either with an extension or just copy&paste from
| the interpreter window.
|
| When it gets hairy is that most programming languages have low
| entrance barrier. To write Haskell effectively you've got to
| unlearn a lot of rooted bad habits and you get to dive into the
| "mathematical" aspect of the language. Not only you got monads,
| but there's plethora of other types you need to get comfortably
| onboard with and the whole branch of mathematics talking about
| types (you don't need to even know that such a field as category
| theory exists to use it).
|
| However, since most people just want to write X, or just want
| hire a dev team at price they can afford, Haskell rarely is the
| first choice language.
| enaaem wrote:
| Related xkcd
|
| https://xkcd.com/327/
| tpowell wrote:
| Little Bobby Tables. I came here to post this.
| lolinder wrote:
| This comment was buried in a thread, but I'm bringing it out
| because it's very relevant to the conversation:
|
| https://news.ycombinator.com/item?id=26679728
|
| > the article refers to YAML 2.O, a nonexistent spec, and to
| PyYAML, a real parser which supports only YAML 1.1.
|
| > Both the unquoted-YES/NO-as-boolean and sexagesimal literals
| were removed in YAML 1.2.
| keeperofdakeys wrote:
| This is part of more general problem, they had to rename a gene
| to stop excel auto-completing it into a date.
|
| https://www.theverge.com/2020/8/6/21355674/human-genes-renam...
|
| Edit: Apparently Excel has its own Norway Problem ...
| https://answers.microsoft.com/en-us/msoffice/forum/msoffice_...
| WalterBright wrote:
| Good language design involves deliberately adding redundancy
| which acts like a parity bit in that errors are more likely to
| be detected.
| richthegeek wrote:
| That's an interesting statement to apply to natural
| languages.
|
| Consider this headline in English: "Man attacks boy with
| knife". This can be read two ways, either the man is using a
| knife to attack the boy, or the boy had the knife and thus
| was being attacked.
|
| The same sentence in Polish would make use of either genitive
| or instrumental case to disambiguate (although barely).
| However, a naive translation would only differ in the
| placement of a `z` (with) and so errors could still slip
| through. At least in this case the error would not introduce
| ambiguity, simply incorrectness.
|
| Similar to language design we can also consider: does the
| inclusion/requirement of parity features reduce the
| expressivity of the language?
| tremon wrote:
| _does the inclusion /requirement of parity features reduce
| the expressivity of the language?_
|
| This was a real eye-opener for me when learning Latin in
| school: stylistic expressions such as meter, juxtaposition,
| symmetry are so much easier to include when the meaning of
| a sentence doesn't depend on word order.
| thaumasiotes wrote:
| > stylistic expressions such as meter, juxtaposition,
| symmetry are so much easier to include when the meaning
| of a sentence doesn't depend on word order.
|
| Eh.... some things are easy and some things are hard in
| any language. The specifics differ, and so do the details
| of what kinds of things you're looking for in poetry.
| Traditional Germanic verse focuses on alliteration.
| Modern English verse focuses on rhyme. Latin verse
| focuses on neither. [1]
|
| English divides poetically strong syllables from
| poetically weak syllables according to stress. It also
| has mechanisms for promoting weak syllables to strong
| ones if they're surrounded by other weak syllables.
|
| In contrast, Latin divides strong syllables from weak
| syllables by length. Stress is irrelevant. But while
| stress can be changed easily, you're much more restricted
| when it comes to syllable length -- and so Publius
| Ovidius Naso is _invariably_ referred to by cognomen in
| verse, because _it isn 't possible_ to fit his nomen,
| Ovidius, into a Latin metrical scheme. That's not a
| problem English has.
|
| [1] I am aware of one exceptional Latin verse:
|
| > O Tite, tute, Tati, tibi tanta, tyranne, tulisti.
| [deleted]
| mcv wrote:
| The real problem here is that people use Excel to maintain
| data. Excel is terrible at that. But the fact that it may
| change data without the user being aware of it, is absolutely
| the biggest failing here.
| slightwinder wrote:
| The problem is more that it's insanly overpowered, while
| aiming for convenience out of the box. An "Excel Pro"-Version
| which takes away all the convenience and gives the user the
| power to configure the power pinpointet to their task might
| be a better solution. Funny part is, most of those things are
| already configurable now, but users are not educated enough
| about their tools to actually do it.
| mfer wrote:
| Excel allows people to maintain data all over the place. From
| golf league data to job actual data compared to estimates to
| so much more. And, excel is accessible enough that tens of
| millions (or maybe more) of people do it.
| qwertox wrote:
| Regarding Excel: It also happens with Somalia, which makes this
| issue even stranger. Apparently because of "SOM".
| bilalq wrote:
| I don't understand why those support agents for Microsoft just
| threw their hands up in the air and asked customers to go
| through some special process for reporting the bug in Excel.
| Why are they not empowered/able to report the issue on behalf
| of customers? It's so clearly a bug in Excel that even they are
| able to reproduce with 100% reliability.
| sneak wrote:
| It looks like it is intended behavior in Excel.
| njarboe wrote:
| Yes. Excel cells are set to a "General" format that, by
| default, tries to guess the type of data the cell should be
| from its content. A date looking entry gets converted to a
| date type. A number looking string to a number (so 5.80 -->
| 5.8, very annoying since I believe in significant digits)
| When you import cvs data, for example, the default import
| format is "General" so date looking strings will be changed
| to a date format. This can be avoided by importing the file
| and choosing to import the data as "Text". People having
| these data corruption problems forgot to do that.
|
| It's "user error" except that there is no way to set the
| default import to import as "Text" (as far as I know), so
| one has to remember to do the three step "Text" import
| every time instead of the default one step "General"
| import.
| dalbasal wrote:
| I suppose this is a cliched thought, but the more general
| problem kind of emblematic of current "smart" features... and
| their expected successors.
|
| OOH, this is a a typically human problem. We have a system.
| It's partly designed, partly evolved^. It's true enough to
| serve well in the contexts we use it in on most days. There are
| bugs in places (like norway, lol) that we didn't think of
| initially, and haven't encountered often enough to evolve
| around.
|
| In code, we call it bugs. In bureaucracy, we just call it
| bureaucracy. Agency A needs institution B's document X, in a
| way that has bugs.
|
| Obviously, it's also a typical machine problem. @hitchdev wants
| to tell pyyaml that Norway exists, and pyyaml doesn't
| understand. A user wants to enter "MARCH1" as text (or the name
| of a gene), and excel doesn't understand.
|
| Even the most rigid bureaucracy is made of people and has
| fairly advanced comprehension ability though. If Agency A,
| institution B or document X are so rigid that "NO" or "MARCH1"
| break them... it probably means that there's a machine bug
| behind the human one.
|
| Meanwhile... a human reading this blog (even if they don't
| program) can understand just fine from context and assumptions
| of intent.
|
| IDK... maybe I'm losing my edge, but natural language
| programming is starting to seem like a possibility to me.
|
| ^I feel like we need a new word for these: versioned, maybe?
| masklinn wrote:
| > This is part of more general problem
|
| The more general problem basically being sentinel values (which
| these sorts of inferences can be treated as) in stringly-typed
| contexts: if everything is a string and you match some of those
| for special consideration, you will eventually match them in a
| context where that's wholly incorrect, and break something.
| pdkl95 wrote:
| edit: fixed formatting problem
|
| > sentinel values
|
| Using in-band signaling always involves the risk of
| misinterpreting types.
|
| > This is part of more general problem
|
| DWIM ("Do What I Mean") was a terrible way to handle typos
| and spelling errors when Warren Teitelman tried it at Xerox
| PARC[1] over 50 years ago. From[2]:
|
| >> In one notorious incident, Warren added a DWIM feature to
| the command interpreter used at Xerox PARC. One day another
| hacker there typed delete *$
|
| >> to free up some disk space. (The editor there named backup
| files by appending $ to the original file name, so he was
| trying to delete any backup files left over from old editing
| sessions.) It happened that there weren't any editor backup
| files, so DWIM helpfully reported *$ not
| found, assuming you meant 'delete *'
|
| >> [...] The disgruntled victim later said he had been sorely
| tempted to go to Warren's office, tie Warren down in his
| chair in front of his workstation, and then type 'delete *$'
| twice.
|
| Trying to "automagically" interpret or fix input is always a
| terrible idea because you cannot discover the actual _intent_
| of an author from the text they wrote. In literary criticism
| they call this problem "Death of the Author"[3].
|
| [1] https://en.wikipedia.org/wiki/DWIM
|
| [2] http://www.catb.org/jargon/html/D/DWIM.html
|
| [3]
| https://tvtropes.org/pmwiki/pmwiki.php/Main/DeathOfTheAuthor
| wnoise wrote:
| Eh. "Death of the Author" is a reaction to the text not
| being dispositive as to what the author meant. It's
| deciding you don't care what the author meant, no longer
| considering it a problem that the text doesn't reveal that.
| Instead the text means whatever you can argue it means.
|
| Which can be a fun game, but is ultimately pointless.
| lisper wrote:
| >> [...] The disgruntled victim later said he had been
| sorely tempted to go to Warren's office, tie Warren down in
| his chair in front of his workstation, and then type
| 'delete $' twice.
|
| Ironically, this did not render the way you intended
| because HN interpreted the asterisk as an emphasis marker
| in this line.
|
| It works here: ... type 'delete *$'
| twice.
|
| because the line is indented and so renders as code, but
| not here:
|
| > ... type 'delete _$ ' twice.
|
| because the subsequent line has _emphasized text*. So the
| scoping of the asterisks is all screwed up.
| chrisdone wrote:
| That's a shrewd observation. Static types help with this
| somewhat. E.g. in Inflex, if I import some CSV and the string
| "00.10" as 0.1, then later when you try to do work on it like
|
| x == "00.10"
|
| You'll get a type error that x is a decimal and the string
| literal is a string. So then you know you have to reimport it
| in the right way. So the type system told you that an
| assumption was violated.
|
| This won't always happen, though. E.g. sort by this field
| will happily do a decimal sort instead of the string 00.10.
|
| The best approach is to ask the user at import time "here is
| my guess, feel free to correct me". Excel/Inflex have this
| opportunity, but YAML doesn't.
|
| That is, aside from explicit schemas. Mostly, we don't have a
| schema.
| alpaca128 wrote:
| > E.g. sort by this field will happily do a decimal sort
| instead of the string 00.10.
|
| So that system is not consistent with type checking? How is
| this not considered a bug?
| chrisdone wrote:
| I mean if the value is imported as a decimal, then a sort
| by that field will sort as decimal. This might not be
| obvious if a system imports 23.53, 53.98 etc - a user
| would think it looks good. It only becomes clear that it
| was an error to import as a decimal when we consider
| cases like "00.10". E.g, package versions: 10.10 is a
| newer version than 10.1.
|
| Types only help if you pick the right ones.
| dalbasal wrote:
| If we're talking about _general_ problems, then I don 't
| think we can be satisfied with " _sometimes it 's a problem
| with types and sometimes it's a UI bug_." That's not
| general.
| christophilus wrote:
| Basically, autoimmune disease, but for software.
| jgalt212 wrote:
| and cusips, which are strings, get converted to scientific
| notation.
|
| https://social.msdn.microsoft.com/Forums/vstudio/en-US/92e0a...
| afturkrull wrote:
| > they had to rename a gene to stop excel auto-completing it
| into a date.
|
| No one in their right mind uses a spreadsheet for data
| analysis. Good for working out your ideas but not in a
| production environment. I figure excel was chosen as this the
| utility the scientists were most familiar with.
|
| The proper tool for the job would be a database. I recall
| reading about a utility, a highly customized database with an
| interface that looks just like a spreadsheet.
| mattkrause wrote:
| The analysis itself isn't (usually) happening in Excel.
|
| A lot of tools operate on CSV files. People use Excel to peek
| at the results or prepare input for other tools, and that's
| how the date coercion slips in.
|
| Sometimes, people do use it to collate the results of small
| manual experiments, where a database might be overkill. Even
| so, the data is usually analyzed elsewhere (R, graphPad,
| etc).
| andrepd wrote:
| I'd say the more general problem is a bad type system! In any
| language with a half decent type system where you can define
| `type country = Argentina | ... | Zambia` this would be
| correctly handled at compile-time, instead of having strange
| dynamic weak typing rules (?) which throw runtime errors in
| production (???).
| wayoutthere wrote:
| The one I've seen was a client who wanted to store credit card
| numbers in an Excel sheet (yes I know this is a bad idea, but
| it was 15 years ago and they were a scummy debt collection call
| center). Signed integers have a size limit, which a 16 digit
| credit card number significantly exceeds.
|
| Now, you and I know this problem is solved by prepending ' to
| the number and it will be treated as a string, but your average
| Excel user has no understanding of types or why they might
| matter. Many engineers will also look past this when generating
| Excel reports.
| zoward wrote:
| An even more general problem is that we as humans use pattern-
| matching as a cerebral tool to navigate our environment, and
| sometimes the patterns aren't what they appear to be. The
| Norway problem is the programming equivalent of an optical
| illusion.
| helsinkiandrew wrote:
| > they had to rename a gene to stop excel auto-completing
|
| I can just about understand that "No" might cause a problem,
| but "Membrane Associated Ring-CH-Type Finger 1" being converted
| to MAR-1 defeats me.
| jasode wrote:
| _> , but "Membrane Associated Ring-CH-Type Finger 1" being
| converted to MAR-1 defeats me._
|
| No, that's not what's happening. To clarify...
|
| If you type a _41 characters_ long string of _" Membrane
| Associated Ring-CH-Type Finger 1"_ into a cell -- Excel will
| _not_ convert that to a date of MAR-1.
|
| On the other hand, it's if you type an _6-char abbreviation_
| of _" MARCH1"_ that _looks like a realistic date_ -- Excel
| converts it to MAR-1.
| jasode wrote:
| That author's blog post sent me down a rabbit hole of insanity
| with YAML and the PyYAML parser idiosyncrasies.
|
| First, he mentions "YAML 2.0" but there's no such reference about
| "2.0" from yaml.org or Google/Bing searches. Yaml.org and
| wikipedia says yaml is at 1.2. Apparently the other commenters in
| this thread clarified that the older "YAML 1.1" is what the
| author is referring to.
|
| Ok, if we look at the official YAML 1.1 spec[1], it has this
| excerpt for implicit bool conversions:
| y|Y|yes|Yes|YES|n|N|no|No|NO
| |true|True|TRUE|false|False|FALSE |on|On|ON|off|Off|OFF
|
| But the pyyaml code excerpts[2][3] from resolver.py has this:
| u'tag:yaml.org,2002:bool',
| re.compile(ur'''^(?:yes|Yes|YES|n|N|no|No|NO
| |true|True|TRUE|false|False|FALSE
| |on|On|ON|off|Off|OFF)$''', re.X),
|
| The programmer _omitted_ the single character options of 'y' and
| 'Y' but it still has 'n' and 'N' ?!? The lack of symmetry makes
| the parser inconsistent.
|
| And btw for trivia... PyYAML also converts strings with leading
| zeros to numbers like MS Excel:
| https://stackoverflow.com/questions/54820256/how-to-read-loa...
|
| [1] https://yaml.org/type/bool.html
|
| [2] 2020 latest:
| https://github.com/yaml/pyyaml/blob/ee37f4653c08fc07aecff69c...
|
| [3] 2006 original :
| https://github.com/yaml/pyyaml/blob/4c570faa8bc4608609f0e531...
| atombender wrote:
| The world _desperately_ needs a replacement for YAML.
|
| TOML is fine for configuration, but not an adequate solution for
| representing arbitrary data.
|
| JSON is a fine data exchange format, but is not particularly
| human-friendly, and is especially poor for editable content:
| Lacks comments, multi-line strings, is far too strict about
| unimportant syntax, etc.
|
| Jsonnet (a derivative of Google's internal configuration
| language) is very good, but has failed to reach widespread
| adoption.
|
| Cue is a newer Jsonnet-inspired language that ticks a lot of
| boxes for me (strict, schema support, human-readable, compact),
| but has not seen wide adoption.
|
| Protobuf has a JSON-like text format that's friendlier, but I
| don't think it's widely adopted, and as I recall, it inherits a
| lot of Protobufisms.
|
| Dhall is interesting, but a bit too complex to replace YAML.
|
| Starlark is a neat language, but has the same problem as Dhall.
| It's essentially a stripped-down Python.
|
| Amazon Ion [1] is neat, but I've not seen any adoption outside of
| AWS.
|
| NestedText [2] looks promising, but it's just a Python library.
|
| StrictYAML [3] is a nice attempt at cleaning up YAML. But we need
| a new language with wide adoption across many popular languages,
| and this is Python only.
|
| Any others?
|
| [1] https://amzn.github.io/ion-docs/
|
| [2] https://nestedtext.org/
|
| [3] https://github.com/crdoconnor/strictyaml/
| svnpenn wrote:
| You seem pretty quick to disregard TOML. I switched all my JSON
| and YAML for TOML. Do you care to detail what is missing?
| atombender wrote:
| TOML quickly breaks down with lots of nested arrays of
| objects. For example: a: b:
| - c: 1 - d: - e: 2 - f:
| g: 3
|
| Turns into this, which is unreadable:
| [[a.b]] c = 1 [[a.b]] [[a.b.d]]
| e = 2 [[a.b.d]] [a.b.d.f] g = 3
|
| TOML also has a few restrictions, such as not supporting
| mixed-type arrays like [1, "hello", true], or arrays at the
| root of the data. JSON can represent any TOML value (as far
| as I know), but TOML cannot represent any JSON value.
|
| At my company we use YAML a lot for table-driven tests (e.g.
| [1]), and this not only means lots of nested arrays, but also
| having to represent pure data (i.e. the expected output of a
| test), which requires a format that supports encoding
| arbitrary "pure" data structures of arrays, numbers, strings,
| booleans, and objects.
|
| [1] https://github.com/sanity-io/groq-test-suite/
| svnpenn wrote:
| Looks fine to me: [[a.b]] c = 1
| d = [ { e = 2 }, { f = { g = 3 } }
| ]
| timClicks wrote:
| An improvement, but the original YAML is still
| significantly better, in my opinion.
| Arnavion wrote:
| Also many (most? all?) serializers don't let you control
| which fields are serialized inline vs not. So if you have
| a program that _generates_ configuration, you 're going
| to end up with the original unreadable form anyway.
| ak217 wrote:
| I don't think YAML is going anywhere, largely because it was
| the first format to prioritize readability and conciseness, and
| has used that advantage to achieve critical mass.
|
| It's far more productive to push for incremental changes to the
| YAML spec (or even a fork of it) to make it more sane and
| better defined. Things like a StrictYAML subset mode for
| parsers in other popular languages.
| dragonwriter wrote:
| > It's far more productive to push for incremental changes to
| the YAML spec
|
| The problems this article raises and strictyaml purports to
| address were addressed in YAML 1.2, already supported in
| python via ruamel.yaml; YAML 1.2 addresses much of this in
| the Core schema which is the closest successor to the default
| behavior of earlier spec versions, and does so more
| completely in the support for schemas more generally, which
| define both the supported "built-in" tags (roughly, types)
| and how they are matched from the low-level representation
| which consists only of strings, sequences, and maps (which,
| incidentally, are the only three tags of the "Failsafe"
| schema; there's also a "JSON" Schema between Failsafe and
| Core, which has tags corresponding to the types supported by
| JSON.
| fmakunbound wrote:
| XML and XML Schema solved this more than 20 years ago. It had
| to be replaced with JSON by the web developers though, so they
| could just "eval() it" to get their data.
| servercobra wrote:
| All except the easily written by humans part. Which is kind
| of a key part.
| jdeisenberg wrote:
| XML with RelaxNG (https://relaxng.org/) would have made life
| so much better than using XML Schema, but, as they say, that
| ship has long since sailed.
| MrPatan wrote:
| If all the smart people like you used XML, how come it was so
| painful to use and it died?
| [deleted]
| rayiner wrote:
| <humor>It died because web developers weren't bright enough
| to understand schemas.
|
| </humor>
| takeda wrote:
| Because it offered all these things parent responded, but
| that made it too complex. You either provide schema and get
| commodities of describing it or you don't.
|
| I had a chance of using SOAP at one point. It was a F5
| device and I used a python library. What I really liked is
| that when it connected to it it downloaded its schema, and
| then used that to generate an object. At that point you
| just communicated with device like you did with any object
| in Python.
|
| We abandoned it for inferior technologies like REST and
| JSON, because they were harder to use from JS, as parent
| mentioned.
| MrPatan wrote:
| Parent didn't say it was harder to use from JS. Parent
| said "It had to be replaced with JSON by the web
| developers though, so they could just "eval() it" to get
| their data."
|
| First of all, I was there 20 years ago. I had to deal
| with XML, XSLT, one kind of Java XML parsers that didn't
| fully do what I needed, another kind of Java XML parsers
| that didn't fully do what I needed. And oh boy was it a
| pain. I just wanted to get a few properties of a bunch of
| entities in a bigger XML document, that's all. Big fail.
|
| Second, JSON always had a parser in JS, so I don't know
| where that eval nonsense is coming from.
|
| Third, JS actually had the best dev UX for XML of all
| languages 20 years ago. Maybe you know JavaScript from
| Node.js, but 20 years ago it used to run excusively in
| web browsers, which even then were pretty good at parsing
| XML documents. The browser of course had a JS DOM
| traversal API known to every single JS developer, and
| very soon (Although TBH I can't remember if before or
| after JSON) it also had xpath querying functions, all
| built in.
|
| XML was _so bad_ , that its replacement came from the
| language where it was actually easiest to use. think
| about that for a second.
|
| So the answer to the question "Why was XML replaced?" is
| not "Because webdevs lol".
|
| I suspect it was because it has both content and
| attributes, which all but guarantees it's impossible to
| create a bunch of simple, common data structures from it
| (like JSON does).
| [deleted]
| ng12 wrote:
| Jsonnet hasn't taken off because it's turing complete. It's a
| really great language for generating JSON but not a replacement
| for JSON.
| diggan wrote:
| Seems you're missing my personal favorite, extensible data
| notation - EDN (https://github.com/edn-format/edn). Probably
| I'm a bit biased coming from Clojure as it's widely used there
| but haven't really found a format that comes close to EDN when
| it comes to succinctness and features.
|
| Some of the neat features: Custom literals / tagged elements
| that can have their support added for them on runtime/compile
| time (dates can be represented, parsed and turned into proper
| dates in your language). Also being able to namespace data
| inside of it makes things a bit easier to manage without having
| to result to nesting or other hacks. Very human friendly, plus
| machine friendly.
|
| Biggest drawback so far seems to be performance of parsing,
| although I'm not sure if that's actually about the format
| itself, or about the small adoption of the format and therefore
| not many parsers focusing on speed has been written.
| mc10 wrote:
| S-expressions are super easy to parse and are fairly easy for
| humans to read. See e.g. using s-expressions in OCaml:
| https://dev.realworldocaml.org/data-serialization.html
| Nihilartikel wrote:
| Apropos of this, in Clojure-land the idiomatic serialization
| is, EDN [1], which is pretty ergonomic to work with IMO,
| since in most cases it is the same as a data-literal in
| Clojure.
|
| My feeling is that :keywords reduce the need and temptation
| to conflate strings and boolean/enumerations that occurs when
| there's no clear way to convey or distinguish between a
| string of data and a unique named 'symbol'. I miss them when
| I'm in Pythonland.
|
| [1] https: https://www.compoundtheory.com/clojure-edn-
| walkthrough/
| gnud wrote:
| S-expressions inherits all trouble with data types from json
| (dates, times, booleans, integer size, number vs numeric
| string).
|
| You get neat ways of nesting data, but that is not enough for
| a robust and mistake-resilient configuration language.
|
| The problem isn't parsing in itself. The problem is having
| clear sematics, without devolving into full SGML DTDs (or
| worse still, XML schemas).
| diggan wrote:
| > S-expressions inherits all trouble with data types from
| json (dates, times, booleans, integer size, number vs
| numeric string).
|
| Hm, not sure that's true, S-expressions would only define
| the "shape" of how you're defining something, not the
| semantics of how you're defining something. EDN
| https://github.com/edn-format/edn for all purposes is
| S-expressions and have support for custom literals and
| more, to avoid "the trouble with data types from JSON"
| rubyn00bie wrote:
| Your list is like a graveyard of my dreams and hopes. Anything
| that doesn't validate the format of the underlying data is
| pretty much dead to me...
|
| The problem with most of these is they're useless to describe
| the data. Honestly, it is completely not useful to have the
| following to describe data:
|
| email => string
|
| name => string
|
| dob => string
|
| IMHO, it is akin to having a dictionary (like Oxford English)
| read like:
|
| email - noun
|
| name - noun
|
| birthday - noun
|
| It says next to nothing except, yes, they are nouns. All too
| often I waste time fighting nils and bullshit in fields or
| duplicating validation logic all over the place.
|
| "Oh wow, this field... is a string..? That's great... _smiles
| gently_ except... THERE SHOULD NOT BE EMOJI IN MY FUCKING UUID,
| SCHEMA-CHUD. GET THE FUCK OFF MY LAWN! "
| scythe wrote:
| If you want automatic built-in string validation, one option
| that seems particularly interesting is to use a variant of
| Lua patterns, which are weaker and easier to understand than
| regular expressions, but still provide a significant degree
| of "sanity" for something like an email. The original version
| works on bytes and not runes, but you could simply write a
| parser that works on runes instead, and the pattern-matching
| code is just 400 old and battle-tested lines of C89. You
| might want to add one extension: allow for escape sequences
| to be treated as a single character (hence included in
| repetition operators and adding the capability to match
| quoted strings); with this extension, I think you could
| implement full email address validation:
|
| https://i.stack.imgur.com/YI6KR.png
|
| Lua patterns have also shown up in other places, such as
| BSD's httpd, and an implementation for Rust:
|
| https://www.gsp.com/cgi-bin/man.cgi?section=7&topic=PATTERNS
|
| https://github.com/stevedonovan/lua-patterns
|
| http://lua-users.org/wiki/PatternsTutorial
| geoduck14 wrote:
| >THERE SHOULD NOT BE EMOJI IN MY FUCKING UUID
|
| thanks for the lolz
| Nitramp wrote:
| My experience is that validation quickly becomes surprisingly
| complex, to the point of being infeasible to express in a
| message format.
|
| Not only are the constraints very hard to express (remember
| that one 2000 char regexp that really validates email
| addresses?), they are also contextual: the correct validation
| in an Android client is not the same as on the server side.
| Eg you might want to check uniqueness or foreign key
| constraints that you cannot check on the client. Sometimes
| you want to store and transmit invalid messages (eg partially
| completed user input). And then you have evolving validation
| requirements: what do you do with the messages from three
| years ago that don't have field X yet?
|
| Unfortunately I don't think you can express what you need in
| a declarative format. Even minimal features such as regexp
| validation or enums have pitfalls.
|
| I think it's better to bite the bullet and implement the
| contextually required validation on each system boundary, for
| any message crossing boundaries.
| sangnoir wrote:
| It sounds to me like XML with a DTD & XSD would solve your
| problem. XML no longer fashionable, but its validation is
| Turing-complete
| tormeh wrote:
| I agree with this, something RON/JSON-like with type
| annotations would be great: {
| "isTrue":false:Boolean,
| "id":"123e4567-e89b-12d3-a456-426614174000":UUID }
| djedr wrote:
| Still early, but here's my baby I hope can improve things:
|
| website with grammar spec: https://tree-annotation.org/
|
| prototype of a JSON/YAML alternative for JS:
| https://github.com/tree-annotation/tao-data-js
|
| same thing, even less finished for C#: https://github.com/tree-
| annotation/tao-data-csharp
|
| working on it constantly, more to come soon
| dragonwriter wrote:
| > The world desperately needs a replacement for YAML.
|
| The world desperately needs support for YAML 1.2, which solves
| the problems the article addresses fairly completely (largely
| in the "default" Core schema[0], but more completely with the
| support for schemas in general), plus a bunch of others, and
| has for more than a decade. But YAML 1.2 libraries aren't
| available for most languages.
|
| [0] not actually an official default, but reflects a cleanup of
| the YAML 1.1 behavior without optional types, so its
| defaultish. Back when it looked like YAML 1.3 might happen in
| some reasonably-near future, it was actually indicated by team
| members that the JSON Schema for YAML (not to be confused with
| the JSON Schema spec) would be the explicit default YAML Schema
| in 1.3, which has a lot to recommend it.
| tormeh wrote:
| Nope nope nope. YAML is awful and needs to die. The more you
| look at it the worse it gets. The basic functionality is
| elegant (at least until you consider stuff like The Norway
| Problem), but the advanced parts of YAML are batshit insane.
| dragonwriter wrote:
| "The Norway Problem" is a YAML 1.1 problem, of which there
| are many.
|
| What advanced parts of YAML are you talking about that
| remain problems in YAML 1.2?
| medstrom wrote:
| From the article:
|
| > The most tragic aspect of this bug, howevere, is that
| it is intended behavior according to the YAML 2.0
| specification.
| dragonwriter wrote:
| The article is simply, factually wrong; there is no "YAML
| 2.0 specification" [0], and everything they point to is
| YAML 1.1, and addressed in YAML 1.2 (the most recent YAML
| spec, from 2009.)
|
| [0] https://yaml.org/
| geraldbauer wrote:
| You might look at JSON Next variants (if you remember -
| "classic" JSON is a subset of YAML), see
| https://github.com/json-next/awesome-json-next
|
| My own little JSON Next entry / format is called JSON 1.1 or
| JSONX, that is, JSON with eXtensions, see https://json-
| next.github.io
| orthoxerox wrote:
| The list is missing http://www.relaxedjson.org/
|
| Also, there's no explanation what <..-..> and <..+..> do.
| tormeh wrote:
| Also RON: https://github.com/ron-rs/ron
|
| A bit like JSON5, but I believe even more advanced.
| hansvm wrote:
| > The world desperately needs a replacement for YAML.
|
| For situations like TFA you really want a configuration
| language that behaves exactly like you think it will, and since
| you don't have to interop with other organizations you don't
| really need a global standard.
|
| Moreover, broadly used config languages can be somewhat
| counterproductive to that goal. Take JSON as an example;
| idiomatic JSON serdes in multiple programming languages has
| discrepancies in minint, maxfloat, datetime, timezone, round-
| tripping, max depth, and all kinds of other nuanced issues.
| Existing tooling is nice when it does what you expect, but for
| a no-frills, no-surprises configuration language I would almost
| always just prefer to use the programming language itself or
| otherwise write a parser if that doesn't suffice (e.g., in
| multilingual projects).
|
| Mildly off-topic: The problem here, more or less, was that the
| configuration change didn't have the desired effect on an in-
| memory representation of that configuration. We can mitigate
| that at the language level, but as a sanity check it's also a
| good idea to just diff the in-memory objects and make sure the
| change looks kind of like what you'd expect.
| atombender wrote:
| You don't need wide adoption for internal projects in an
| organization, but you _do_ want great toolchain support.
|
| For example, the fact that NestedText is a Python library
| means a Python team could use it, but it's a poor fit for an
| organization whose other teams use Go and
| JavaScript/TypeScript.
|
| We use YAML for much more than configuration, by the way. I
| feel like YAML hits a nice sweet spot where it's usable for
| almost everything.
| IshKebab wrote:
| JSON5 is the best option currently. A fair number of tools in
| the JS ecosystem support it.
| atombender wrote:
| JSON5 is better than JSON on my points, but it has downsides
| compared to YAML. For example, YAML is _very_ good at
| multiline strings that don 't require any sort of quoting,
| and knows to remove preceding indentation:
| foo: | "This is a string that goes across
| multiple lines," he wrote.
|
| In JSON5, you'd have to write: { foo:
| \"This is a string that goes across \ multiple lines,\"
| he wrote." }
|
| This sort of ergonomic approach is why YAML is so well-liked,
| I think. (Granted, YAML's use of obscure Perl-like sigils to
| indicate whitespace mode is annoying, but it does cover a lot
| of situations.)
|
| YAML is also great at arrays, mimicking how you'd write a
| list in plaintext: foo: - "hello"
| - 42 - true
| dqpb wrote:
| I've used most of the technologies you listed. Cue is the best,
| and the only one with strong theoretical foundations. I've been
| using it for some time now and won't go back to the others.
| debug-desperado wrote:
| Thanks for this list, I've never heard of Ion. I'll consider it
| for config and even replacing Avro & Protobuf in future
| projects.
| joshxyz wrote:
| This is why i love JSON. It's only string, number, boolean,
| arrays, objects/dictionaries, unless you write custom serializer
| and deserializers..
| lokedhs wrote:
| Except that its numbers are underspecified and cannot be used
| safely outside of a certain range. The spec explicitly states
| that the precision of numbers is not defined, meaning that N
| and N+1 may be the same number, and its behaviour would depend
| on the parser you're using.
|
| The number one rule when creating a serialisation format should
| be that serialisation and deserialisation is predictable. It's
| quite remarkable that two of the most popular formats doesn't
| do this.
|
| I'm actually surprised we haven't seen any major security
| issues caused by this.
| bmn__ wrote:
| The problem is insufficiently analysed by the article author and
| the commenters in this thread so far. It is very superficial. The
| recent thread "Can't use iCloud with "true" as the last name"
| https://news.ycombinator.com/item?id=26364993 went deeper. Let me
| take up its relevant particulars into this thread.
|
| The article author hitchdev does not say it outright, but it is
| heavily implied that the YAML file was edited by hand. This is
| the immediate cause of the problem. The indirect root of the
| problem is that the spec authors chose a plain text serialisation
| format and thus created an _affordance_
| http://enwp.org/Affordance#As_perceived_action_possibilities to
| be edited by hand.
|
| This turns out the be unsafe/source of bugs because YAML end-
| users are not capable of correctly applying the serialisation
| rules considering the edge cases detailed in the article because
| humans are creatures of habit, applying analogy and common sense,
| making assumptions and then sometimes go wrong, whereas a piece
| of software will not make the Norway, Null etc. mistakes.
| hitchdev even writes that quoting the string is "a fix for sure,
| but kind of a hack", but that's a grave misunderstanding. Quoting
| the string here is actually applying the serialisation rules
| correctly.
|
| The tangential at the end of the article about typing is also
| orthogonal/irrelevant. YAML is strictly/strongly/unambiguously
| typed, and so is the mentioned variant Strict YAML. The
| difference is that Strict YAML has serialisation rules that are
| more amenable to or aligning with the human factors of habit etc.
| and thus work better in practice.
|
| My personal recommendation is to never edit YAML by hand and
| always use a serialiser. This is less convenient, but safe.
|
| In closing, I would like the reader of this comment to make an
| effort to distinguish between "what is" and "what ought to be" in
| their head, otherwise the ideas here will be very muddled.
| dragonwriter wrote:
| > The problem is insufficiently analysed by the article author
|
| The article author also misidentifies the version of the YAML
| spec (calling it 2.0, which doesn't exist; the behavior is from
| YAML 1.1, and this class of problems motivated a bunch of
| changes in YAML 1.2, which has been out since 2009.)
|
| But the article author isn't trying to analyze the problem,
| he's trying to rationalize why what is notionally a YAML-
| processing library just ignores the spec.
| Aeolun wrote:
| > never edit YAML by hand and always use a serialiser
|
| I don't follow this. If yaml is your config format, and you are
| not editing it by hand, what are you editing?
| bmn__ wrote:
| I work on the deserialisation. This is a one-liner in many
| programming languages.
| sfvisser wrote:
| The problem is not 'someone is not correctly following the
| serialization rules', the problem is 'the serialization rules
| are quite terrible'.
|
| This is not some interesting trade-off, this problem is fixable
| on all axes by using non-ambiguous, non-overloaded typing rules
| for your config format.
|
| Even JSON and XML got this right.
| bmn__ wrote:
| > The problem is not 'someone is not correctly following the
| serialization rules'
|
| Yes, yes, I pointed that out. grep "immediate cause" and
| "indirect root"
|
| > the serialization rules are quite terrible
|
| Did that need to be said explicitly? I agree FWIW. I have
| already made a value judgement mildly against YAML, in case
| that's not clear. It's only mild because the problem can be
| worked around. I think this approach is more practical than
| moving the whole world over to a completely different thing.
|
| > problem is fixable [...] non-ambiguous [...] rules
|
| Is the implication here that you say YAML is ambiguous? It's
| not. I don't want sloppy analysis. To be precise, the
| ambiguity is imagined, it does not exist on the spec or
| software level, only in the head of people.
| atleta wrote:
| The very point of yaml is that it is _easy_ to edit by hand. If
| you use an, I suppose, GUI editor then you don 't need yaml.
| You could use any strictly typed serialization format. (Self
| describing or with a schema.)
| NaturalPhallacy wrote:
| This is why implicit typing is an _invitation_ to errors.
| paulintrognon wrote:
| I am sometimes annoyed by the fact you have to put double quotes
| around string properties in JSON. It would be so much lighter to
| use JS syntax..! Then I read articles like this one. Thank you
| JSON for not trying to be smart.
| teddyh wrote:
| This is another good argument against weak types in general.
| Strong types are better, and explicit is better than implict.
| pintxo wrote:
| > "While the website went down and we were losing money we chased
| down a number of loose ends until finally finding the root
| cause."
|
| And that's why you have a staging environment. Or you debug in
| production, whatever you prefer.
| atoav wrote:
| I'd go further and say _this is why you write tests_. Creating
| tests that cover a lot (or all) possible inputs is sometimes
| not that hard and really pays off if you manage to catch a very
| common error like the Norway thing. Even better if you catch
| something that would have been a nightmare to fix in
| production.
|
| I say this because two days ago I wrote a test that used all
| country codes as input. It took 15 minutes to write that test.
| During the whole testing session I found at least 5 mistakes of
| which 3 would have been quite dramatic.
| simion314 wrote:
| >I say this because two days ago I wrote a test that used all
| country codes as input. It took 15 minutes to write that
| test. During the whole testing session I found at least 5
| mistakes of which 3 would have been quite dramatic.
|
| And how many minutes to test all
| city/state/region/street/person names ?
|
| It can also happen that you test s will become outdated, like
| when url standard changed and more characters codes were
| allowed.
| mrighele wrote:
| Or you just return to the previos (and working) version of the
| website while you fix the issue. At least if you a good old
| monolith; if you have 10s of microservices it may be more
| complicated
| groundCode wrote:
| Bugs make it to production no matter how careful you are.
|
| What matters is how you deal with incidents as an organisation,
| not that you should never release a bug.
| eitland wrote:
| Everybody has a testing environment. Some people are lucky
| enough enough to have a totally separate environment to run
| production in.
|
| https://mobile.twitter.com/stahnma/status/634849376343429120
| pietroppeter wrote:
| I like strict yaml but I have used it very little. Anyone who
| uses it more that can give more feedback?
| [deleted]
| grenoire wrote:
| I was helping out a friend of mine in the risk department of a
| Big 4; he was parsing CSV data from a client's portfolio. Once he
| started parsing it, he was getting random NaNs (pandas' nan type,
| to be more accurate).
|
| I couldn't get access to the original dataset but the column gave
| it away. Namibia's 2-letter ISO country code is NA--which happens
| to be in pandas' default list of NaN equivalent strings.
|
| It was a headache and a half...
| mseepgood wrote:
| A Ms True also broke Apple's iCloud:
| https://twitter.com/RachelTrue/status/1365461618977476610
| grenoire wrote:
| That looks like an interesting hard-coded check, I wonder
| what it intended to fix.
| fanf2 wrote:
| There's some analysis in this twitter thread:
| https://twitter.com/badedgecases/status/1368362392573317120
|
| tl;dr: there are a bunch of fields of various types that
| arrive as strings, and they get coerced but without paying
| attention to which field should have which type
| grenoire wrote:
| Verbatim from the docs, on read-csv:
| na_valuesscalar, str, list-like, or dict, default None
| Additional strings to recognize as NA/NaN. If dict passed,
| specific per-column NA values. By default the following values
| are interpreted as NaN: '', '#N/A', '#N/A N/A', '#NA',
| '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN',
| '<NA>', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan', 'null'.
|
| You fix it by using `keep_default_na=False`, by the way.
| dragonwriter wrote:
| its weird that this is a 2019 article misrepresenting behavior in
| the YAML 1.1 spec (2005) most of which reverted in the YAML 1.2
| spec (2009) as being part of a nonexistent YAML 2.0 spec and
| justifying a library that purports to handle "YAML" ignoring the
| spec.
| atombender wrote:
| You're right, but it's worth noting that much of the world is
| still on YAML 1.1, for whatever reason, so _in practice_ ,
| these are actual problems that will be encountered in the real
| world.
|
| For example, Ruby's standard library only supports YAML 1.1. It
| relies on libyaml, which is not yet compliant with 1.2.
| Meanwhile, Python's popular PyYAML library only supports 1.1,
| and asks users to migrate to a newer fork called ruamel.yaml
| for 1.2 support.
| dragonwriter wrote:
| > You're right, but it's worth noting that much of the world
| is still on YAML 1.1
|
| This is an article justifying use of (and justifying design
| decisions of) a particular Python quasi-YAML parsing library.
| If you are in a position to select a non-YAML-1.1-compliant
| parsing library for Python, or to take the articles advice on
| design of a YAML(-ish) parsing library, you are, necessarily,
| _not_ stuck with YAML 1.1.
|
| > for whatever reason
|
| Articles like this spreading misinformation about the current
| state of standard YAML are part of the reason. LibYAML
| lagging support is another since so much of the ecosystem
| depends on libYAML (though, while the documentation situation
| is terrible, it looks like maybe libYAML has some level of
| 1.2 support since 0.23.)
|
| > For example, Ruby's standard library only supports YAML
| 1.1. It relies on libyaml, [...] Python's popular PyYAML
| library only supports 1.1
|
| Which, also, is dependent on libYAML.
|
| > and asks users to migrate to a newer fork called
| ruamel.yaml for 1.2 support.
|
| Which makes a lot more sense than migrating to a library thar
| supports neither 1.1 nor 1.2, but a nonstandard variant that
| addresses some of the same issues resolved years ago in 1.2,
| especially when a library supporting 1.2 is available for the
| same language.
| namelosw wrote:
| I prefer JSON over YAML because I spend more time confused and
| burned by the problems caused by it.
|
| I understand that people don't like directly use JSON because
| it's not very friendly: no comments, no multi-line string, etc.
|
| A great alternative IMHO is cson[0]. It's like JSON to JavaScript
| but for CoffeeScript (though nobody talks about it nowadays). It
| has indentation-based syntax, comments, and multiline string
| which usually don't need to escape. The advantage is it's close
| enough to JSON which is the canonical format that everybody can
| agree on nowadays. For YAML and TOML there are too many visual
| part-aways from JSON.
|
| Or just create a JSON variant that enables comments and the
| backtick multiline string from JavaScript.
|
| [0] https://github.com/bevry/cson
| jmartrican wrote:
| It seems like we need to treat yaml like json and quote all
| strings. Would that help resolve these issues? Just trying to
| figure out a rule I can implement to prevent these issues.
| WalterBright wrote:
| > The most tragic aspect of this bug, howevere, is that it is
| intended behavior according to the YAML 2.0 specification.
|
| This is one of those great ideas that sadly one needs experience
| to realize are really bad ideas. Every new generation of
| programmers has to relearn it.
|
| Other bad ideas that resurface constantly:
|
| 1. implicit declaration of variables
|
| 2. don't really need a ; as a statement terminator
|
| 3. assert should not abort because one can recover from assert
| failures
| atleta wrote:
| I agree with the general observation, but the need for ";" ?
| Quite a few languages (over a few generations) have been doing
| fine without the semicolon. Just to mention two: python and
| haskell. (Yes, python has the semicolon but you'll only ever
| use it to put multiple statements on a single line.)
| Cu3PO42 wrote:
| Haskell has the semicolon for the same reason!
| lelanthran wrote:
| > I agree with the general observation, but the need for ";"
| ? Quite a few languages (over a few generations) have been
| doing fine without the semicolon. Just to mention two: python
| and haskell. (Yes, python has the semicolon but you'll only
| ever use it to put multiple statements on a single line.)
|
| But then it's inconsistent and has unnecessary complexity
| because now there's one (or more) exceptions to the rules to
| remember: when the ';' is needed. And of course if you get it
| wrong you'll only discover it at runtime.
|
| "Consistent applications of a general rule" is preferable to
| "An easier general rule but with exceptions to the rule".
| labawi wrote:
| When you use ; and possibly {, }, code statements / blocks
| are specified redundantly (indentation + separators), which
| can cause inconsistent interpretation of code by compiler /
| readers.
|
| I find it much, much easier to look at code and parse
| blocks via indentation, than the many ways and exceptions
| of writing ; and {, }, while an extra or missing ';' or {}
| easily remains unspotted and leads to silly CVEs.
| nightcracker wrote:
| Have you ever used Python? If you did you really wouldn't
| be saying this. There isn't an exception. The semicolon is
| used to put multiple statements on a single line. That's
| it's only use, and that's the only time it's 'needed' - no
| exceptions.
| lelanthran wrote:
| > Have you ever used Python? If you did you really
| wouldn't be saying this. There isn't an exception.
|
| For the ';', perhaps not. For the token that is used to
| terminate (or separate) statements? Yes, the ';' is an
| exception to the general rule of how to terminate
| statements.
|
| The semicolon also works on some sort of statements and
| not others, throwing errors only at runtime.
|
| It's easier to remember one rule than many.
| pedrovhb wrote:
| Honestly, the rule is "don't use semicolons in Python". I
| don't think there's a single one in the large codebase I
| work with, and there's really no reason at all to use it
| other than maybe playing code golf.
|
| It's not a language in which you ever need be saving
| bytes on the source code. Just use a new line and indent.
| It's more readable and easier.
| c-cube wrote:
| But python has instead the "insert \ sometimes" rule,
| which isn't better.
| atleta wrote:
| There are no exceptions. You only need it if/when you want
| to put multiple statements on a single line. That's its
| sole purpose.
|
| And I'd also add that it's something that you almost never
| do. One practical use is writing single line scripts that
| you pass to the interpreter on the command line. E.g.
| `python -c 'print("first command"); print("second
| command")'`
|
| If you don't know about the `;` at all in python then you
| are 100% fine.
| ufo wrote:
| Another inreresting example is Lua. It's a free form language
| without semicolons. It's not indentation sensitive.
| samatman wrote:
| Lua does have semicolons!
|
| It even has semicolon insertion, but because the language
| is carefully designed, this doesn't cause problems, and
| most users can go a lifetime without knowing about it.
|
| Our coding style requires semicolons for uninitialized
| variables, so you'll see
|
| ``` local x; if flag then x = 12 else x = 24 end ```
|
| As a way of marking that the lack of initialization is
| deliberate. `local x = nil` is used only if x might remain
| nil.
| yakshaving_jgt wrote:
| > Yes, python has the semicolon but you'll only ever use it
| to put multiple statements on a single line.
|
| This is also true of Haskell btw.
| asiachick wrote:
| What do think of implicit member access (C++, Java, C#) vs
| explicit (python, javascript)? Is there a concrete argument one
| way or the other?
|
| I feel like I prefer explicit self.member =
| value this.member = value
|
| vs implicit member = value
|
| But clearly C++/Java/C# people are happy with implicit ...
| though many of them try to make it explicit by using a naming
| convention.
| mcv wrote:
| The fact that people introduce naming conventions to keep
| track of member variables is probably the biggest
| condemnation of implicit member access. People clearly need
| to know this, so you'd better make it explicit.
|
| It's actually a bit surprising that this is one thing that
| javascript does better than Java. In most other areas, it's
| Java that's (sometimes overly) explicit.
| coopierez wrote:
| That was my single biggest pet-peeve of C++. A variable
| appears in the middle of a member function? Good luck
| figuring out what owns it. Is it local? Owned by the class?
| The super-class? (And in that case - which one?)
|
| The added mental load of tracking variables' sources builds
| up.
| logicchains wrote:
| FWIF, most C++ style guards recommend writing member
| variables like mVariableName or variable_name_ so they're
| easy to distinguish from local variables, and modern C++
| doesn't generally make much use of inheritance so there's
| usually only one class it could belong to.
| aasasd wrote:
| I can tell for certain that as a JS/Python man, every time I
| look through Java code I have to spend a bit of time when
| stumbling upon such access, until I remember that it's a
| thing in Java. Pity that Kotlin apparently inherited it.
|
| But at least, to my knowledge, in Java these things can't
| turn out to be global vars. Having this 'feature' in JS or
| Python would be quite a pain in the butt.
| linspace wrote:
| > implicit declaration of variables
|
| This is so true. I really like Julia and I know that explicitly
| declaring variables would be detrimental to adoption but I
| prefer it to the alternative, which is this:
| https://docs.julialang.org/en/v1/manual/variables-and-scopin...
| goatinaboat wrote:
| _This is one of those great ideas that sadly one needs
| experience to realize are really bad ideas. Every new
| generation of programmers has to relearn it._
|
| It's a bad idea because ASCII already includes dedicated
| characters for field separator, record separator and so on.
| These could easily be made displayable in a text editor if you
| wanted just as you can display newlines as |. Anyone who
| invents a format that involves using normal printable
| characters as delimiters and escaping them when you need them,
| is, I feel very confident in saying, grotesquely and
| malevolently incompetent and should be barred from writing
| software for life. CSV, JSON, XML, YAML, all guilty.
| spion wrote:
| how do you write them though
| teddyh wrote:
| Ctrl-\, Ctrl-], Ctrl-^ and Ctrl-_ for file, group, record
| and unit separator, respectively.
|
| However, your tty driver, terminal or program are all
| likely to eat them or munge them. Also, virtually nothing
| actually uses these characters for these purposes.
| goatinaboat wrote:
| _virtually nothing actually uses these characters for
| these purposes._
|
| Right. Which is why we have all these hilarious escaping
| and interpolation problems. Any why programmers will
| never be taken seriously by real engineers. It's like we
| have cement mixed and ready to go but we decide to go and
| forage for mud instead and think that makes us cleverer
| than the cement guys.
| tjalfi wrote:
| > It's a bad idea because ASCII already includes dedicated
| characters for field separator, record separator and so on.
|
| ASCII is over 60 years old and separators haven't caught on
| yet; what's different now?
|
| > These could easily be made displayable in a text editor if
| you wanted just as you can display newlines as |.
|
| Can you name a common text editor with support for ASCII
| separators? It's a lot easier to use delimiters and escaping
| then change every text editor in the world.
|
| > Anyone who invents a format that involves using normal
| printable characters as delimiters and escaping them when you
| need them, is, I feel very confident in saying, grotesquely
| and malevolently incompetent and should be barred from
| writing software for life. CSV, JSON, XML, YAML, all guilty.
|
| All of the formats you rant about are widely used, well
| supported, and easy to edit with a text editor - none of
| these are true of ASCII separators. People chose formats they
| can edit today instead of formats they might be able to edit
| in the future. All of these formats have some issues but none
| of the designers were incompetent.
| erik_seaberg wrote:
| US-ASCII only has four information separators, and I believe
| they can only be used in a four-layer schema with no
| recursion, sort of like CSV (if your keyboard didn't have a
| comma or return key). When you need to pass an object with
| records of fields _as a field_ you're out of luck, and
| everyone has to agree on encoding or escaping them again.
|
| I think SGML (roll your own delimiters and nesting) was
| pretty close to the Right Thing(tm), but ISO has the specs
| locked down so everyone had a second-hand understanding of
| it.
| aasasd wrote:
| The obvious first step toward the brighter future is to
| refrain from using any and all software that utilizes the
| malevolent formats you mentioned. Doing otherwise would mean
| simply being untrue to one's own conscience and word.
| lixtra wrote:
| I'm surprised that with your experience you come to such
| unbalanced conclusions. Everything in engineering is about
| trade-offs and while your conclusions may be indisputable for
| the design goals of D they may wrong in other contexts.
|
| 1. If I scribble some one time code etc. the probability of
| having an error coming from implicit declarations is in the
| same order of magnitude as missing out edge cases or not
| getting the algorithm right for most people. The extra
| convenience may well be worth it.
|
| 2. I would relax this it should be clear to the programmer
| where a statement ends.
|
| 3. Go on with a warning is a sane strategy in some situations.
| I happily ruin my car engine to drive out of the dessert. The
| assert might have been to strict and i know something about the
| data so the program can ignore the assert failure.
| WalterBright wrote:
| Your rationale in this and your followups are exactly what
| I'm talking about.
|
| 1. You're actually right if the entire program is less than
| about 20 lines. But bad programs always grow, and implicit
| declaration will inevitably lead you to have a bug which is
| really hard to find.
|
| 2. The trouble comes from programmer typos that turn out to
| be real syntax, so the compiler doesn't complain, and people
| tend to be blind to such mistakes so don't see it. My
| favorite actual real life C example: for (i
| = 0; i < 10; ++i); { do_something();
| }
|
| My friend who coded this is an excellent, experienced
| programmer. He lost a day trying to debug this, and came to
| me sure it was a compiler bug. I pointed to the spurious ;
| and he just laughed.
|
| (I incorporated this lesson into D's design, spurious ;
| produce a compiler error.)
|
| 3. I used to work for Boeing on flight critical systems, so I
| speak about how these things are really designed. Critical
| systems always have a backup. An assert fail means the system
| is in an unknown, unanticipated state, and cannot be relied
| on. It is shut down and the backup is engaged. The proof of
| this working is how incredibly safe air travel is.
| lixtra wrote:
| > 3. I used to work for Boeing on flight critical systems,
| so I speak about how these things are really designed.
| Critical systems always have a backup. An assert fail means
| the system is in an unknown, unanticipated state, and
| cannot be relied on. It is shut down and the backup is
| engaged.
|
| I ask you to reconsider your assumptions. How did this play
| out in the 737 MAX crashes? Was there a backup AoA sensor?
| Did MCAS properly shut down and backup engaged? Was manual
| overriding the system not vital knowledge to the crew?
|
| You don't have to answer. I probably wouldn't get it
| anyway.
|
| But rest assured that I won't try to program flight control
| and I strongly appreciate your strive for better software.
| WalterBright wrote:
| > How did this play out in the 737 MAX crashes?
|
| They didn't follow the rule in the MCAS design that a
| single point of failure cannot lead to a crash.
|
| > Was manual overriding the system not vital knowledge to
| the crew?
|
| It was, and if the crew followed the procedure they
| wouldn't have crashed.
| unionpivo wrote:
| I disagree with most of what you said but I want to
| specifically call out:
|
| > 3. Go on with a warning is a sane strategy in some
| situations.
|
| No, if its sometimes ok, to continue, than you should not
| assert it.
|
| Assert means "I assert this will always be true, and if it's
| not our runtime is in unknown/bad state."
|
| If you think you can recover, or partially recover,
| throw/return appropriate error, and go into
| emergency/recovery mode.
| lixtra wrote:
| Your reactor is boiling. Your control software shut down
| with assertion failed: temperature too high, cannot display
| more than 3 digits.
|
| Downvote me if you want to open a bug ticket with the
| vendor and wait a week for the fix.
|
| Upvote me if you'd give it a try to restart with a switch
| to ignore assertions.
|
| You may abstain if you never shipped a bug.
|
| Edit: not to forget that this website runs on lisp which
| violates all three. Was it really a bad choice for the
| website?
| unionpivo wrote:
| > Your reactor is boiling. Your control software shut
| down with assertion failed: temperature too high, cannot
| display more than 3 digits.
|
| Several points:
|
| 1. Most of such critical components have several
| different and independent implementations, with analog
| backup (if possible).
|
| 2. You are arguing one specific safety critical case,
| that 99.999% or even more programmers will never face,
| should somehow inform decision about general purpose
| programming language.
|
| 3. Even if you are working in such safety critical
| situation, you should not really on assertion bypass, but
| have separate emergency procedure, which bypasses all the
| checks and try's to force the issue. (ever saw a --force
| flag ?)
|
| Because what happens in reality, is developer encounters
| a bug (maybe while its still in development), notice you
| can bypass it by disabling assertions (or they are
| disabled by default), log it as a low priority bug, that
| never gets fixed.
|
| Then a decade later me or someone like me is cursing you
| because you enterprise app just shit the bed, and is
| generating tons of assertion warnings, even when it
| running normally, so I have to figure out, which of them
| are "just normal" program flow, and which one just caused
| an outage.
|
| I never experienced situation like you described, but I
| have experienced behavior like I wrote above, too many
| times.
|
| Botom line is:
|
| - don't assert if you don't mean it
|
| - if you need bypass for various runtime checks, code one
| in explicitly.
|
| Edit: Hacker News is written in ARC which is schema
| dialect. ARC doesn't have assertions as far as i can
| tell.
|
| ARC doesn't have its own runtime and is run on racket
| language, that has optional assertion, that exit the
| runtime if they fail https://docs.racket-lang.org/ts-
| reference/Utilities.html
| jrockway wrote:
| I agree with this. Nuclear reactors are a special case of
| systems where removing energy from the system makes it
| more unsafe, because it generates its own energy and
| without a control system it will generate so much energy
| that it destroys itself (and due to the nature of
| radiation, destroys the surrounding suburbs too).
|
| With most systems, the safest state is off. CNC machine
| making a weird noise? Smash that e-stop. Computer
| overheating? Unplug it. With this in mind, "assert"
| transitions the system from an undefined state to an
| inoperative state, which is safer.
|
| That isn't to say that that you want bugs in your code,
| and that energizing some system is free of consequences.
| Your emergency stop of your mill just scrapped a $10,000
| part. Unplugging your server made your website go down
| and you lost a million dollars in revenue. But, it didn't
| kill someone or burn the building down, so that's nice.
| WalterBright wrote:
| See my previous reply. Your reactor design is susceptible
| to a single point of failure, and, how do I say it
| strongly enough, is an utterly incompetent design.
| Bypassing assertions is not the answer.
| tralarpa wrote:
| > 1. If I scribble some one time code
|
| .... and here is another entry for Walter's list of bad
| ideas:
|
| 4. "It's okay. I will use this code only once"
| erik_seaberg wrote:
| My favorite Red Green quote is "now, this is only temporary
| ... unless it works."
| jhomedall wrote:
| F#, Kotlin, Python, Nim and many others all seem to get by fine
| without semicolons as statement terminators.
| WalterBright wrote:
| In Python, a newline is a token and serves as a statement
| terminator.
|
| What I'm referring to is the notion that: a
| = b c = d;
|
| can be successfully parsed with no ; between b and c. This is
| true, it can be. But then it makes errors difficult to
| detect, such as: a = b *p;
|
| Is that one statement or two?
| jansan wrote:
| Norway is one of the luckiest countries in the world. They have a
| vast amount of resources, can produce their electrical energy
| entirely from hydropower, have a great democracy, a government
| they can trust, a beautiful landscape and great people.
|
| I must say that I feel a little bit of relief to see that they
| have problems that nobody else has, besides insanely expensive
| alcohol that is only sold in "wine monopoly" stores that are more
| heavily guarded than banks.
___________________________________________________________________
(page generated 2021-04-03 23:01 UTC)