[HN Gopher] Why Compilers Don't Autocorrect "Obvious" Parse Errors
       ___________________________________________________________________
        
       Why Compilers Don't Autocorrect "Obvious" Parse Errors
        
       Author : skilled
       Score  : 50 points
       Date   : 2022-04-09 05:57 UTC (1 days ago)
        
 (HTM) web link (chelseatroy.com)
 (TXT) w3m dump (chelseatroy.com)
        
       | kayodelycaon wrote:
       | > [Javascript and to some degree ruby] will try with all their
       | might to divine something runnable from what you wrote. How kind
       | of them, right?
       | 
       | Maybe this is semantics, but a loose syntax is different than the
       | language trying to automatically correct mistakes.
       | 
       | JavaScript has optional semi-colons and braces. The semicolons
       | seem to fall into the into the autocorrecting category because
       | you are supposed to use them. Optional braces are a language
       | feature shared with C.
       | 
       | Ruby has optional parentheses on method calls, which is usually
       | fine until you attempt to do `a(b(4))` as `a b 4`. It's easy to
       | get into a syntax error omitting writing code like that. But the
       | fact it will give you a syntax error when it hits an unclear
       | structure means this is a (mis-)feature, rather than an attempt
       | as guessing what you meant.
        
         | zzo38computer wrote:
         | I really dislike the automatic semicolon insertion feature of
         | JavaScript. (It is, in my opinion, one of the worst features of
         | JavaScript.)
         | 
         | (A preprocessor could be used to fix it if wanted, I suppose,
         | but then it must be preprocessed and converted)
        
         | mannykannot wrote:
         | These two things are related in that autocorrection tends to
         | create de-facto loose syntax. Once programmers become aware of
         | it and it becomes part of the language as it is used, the
         | language specification becomes more complicated - and possibly
         | extremely so, as every edge case between what the current
         | parser can and cannot correct correctly becomes part of that
         | specification.
        
         | zamadatix wrote:
         | I've always thought of ASI as more "statements are separated
         | for you but if you need them to be separated in a special way
         | you can add semicolons to manually control separation behavior"
         | than a "you forgot semicolons, let me fix your code for you".
         | Pretty much the same as the argument for optional braces being
         | a language feature not an auto-correction.
         | 
         | That said when you look at either in terms of how they are
         | implemented it'll seem like a correction feature. I think the
         | real difference between auto-correction and optional syntax is
         | simply whether or not the language spec designed it to be
         | optional.
        
           | pdpi wrote:
           | Semicolon insertion in Javascript is nothing like braces.
           | 
           | The grammar for e.g. an if statement is simple: *if (*
           | _expression_ *)* _statement_ *else* _statement_. One
           | particular value of _statement_ is a block statement, which
           | is where the braces come from. Nothing more, nothing less.
           | 
           | Inversely, the grammar specifically says that most statements
           | (of types empty, expression, do-while, continue, break,
           | return, throw, and debugger) must end with a semicolon, and
           | ASI is explicitly described as a few cases where you're
           | allowed to add an extra token to the token stream when the
           | grammar refuses to accept the stream as-is.
        
           | MatmaRex wrote:
           | That is a sensible way to think about it, and it would be
           | great if the language worked that way, but unfortunately it
           | does not. Statements in JavaScript are not separated by line
           | breaks.
           | 
           | Here is an example to illustrate:
           | console.log('a')         (1 < 2) ? console.log('b') :
           | console.log('c')
           | 
           | You might expect this to output 'a', then 'b'. However, it
           | instead outputs 'a' and then throws an error like this:
           | Uncaught TypeError: console.log(...) is not a function
           | 
           | ...because a semicolon was _not_ inserted at the end of the
           | first line.
        
           | plorkyeran wrote:
           | Lua has very similar rules to JS regarding semicolons from
           | the user's perspective: semicolons are optional, and only
           | change a programs meaning in some very unusual edge cases.
           | From a PL design perspective, they're fairly different,
           | though. Lua's grammar doesn't have a statement terminator,
           | and the language just lets you insert semicolons anywhere
           | that doesn't appear inside a statement if you so wish. JS's
           | grammar does have a statement terminator, but has rules for
           | inferring it in some places when it's not present.
           | 
           | Does this distinction matter in practice? Probably not. The
           | more important different is probably just that JS has more
           | unfortunate edge cases related to semicolons than Lua.
        
       | gumby wrote:
       | In the early 70s, Warren Tietelman (also inventor of Undo; his 67
       | PhD thesis for Minsky was on what we would call an IDE today)
       | developed a feature for Interlisp called DWIM, for Do What I
       | Mean. It would figure out that you'd forgotten a paren or
       | mistyped a function name and would rewrite your code for you.
       | 
       | It was good the way autocorrect is good today, and I hated it,
       | but you couldn't switch it off because it was also used for macro
       | expansion!
        
         | lispm wrote:
         | Given that Interlisp is available, one might even try it out
         | today.
         | 
         | https://interlisp.org/
         | 
         | The manual entry for DWIM:
         | 
         | https://interlisp.org/IRM_files/content.htm#bookmark20
        
       | williamstein wrote:
       | The LaTeX compiler tries to correct many parse errors.
        
         | jethkl wrote:
         | As a longtime LaTeX user, I find this behavior irritating, not
         | once has it saved me effort. I would much rather have LaTeX
         | exit to shell immediately and to report a good error. Instead,
         | the dynamic "fix" is applied in opaque manner, totally unclear
         | what the compiler did, it optimizes for "patch the input so
         | compilation runs to completion" instead of "do what the user
         | wanted", and once LaTex has finished, the user still must
         | manually edit the original file to get a proper solution.
        
         | bee_rider wrote:
         | This is the most annoying thing about LaTeX, IMO. The "wait,
         | how many edits back was the error" game is no fun.
        
       | imtringued wrote:
       | The compiler is asking the developer to state his intent.
       | 
       | The fact that some programming languages are overly pedantic is
       | part of their design.
        
       | rhdunn wrote:
       | This is the difference between building a parser for a compiler
       | and for an IDE.
       | 
       | For a compiler, you want to stop at the first error. The parser
       | can also emit an intermediate representation as it goes, so that
       | what it is processing is not necessarily serializable to the
       | original code. This makes it difficult to use as a data model for
       | tools like IDEs.
       | 
       | For an IDE, you want to process the entire file, recovering from
       | errors as you go. This is so that the IDE can keep things like
       | function resolution working without turning the entire file red
       | as the user is typing code, while ideally only updating the
       | references that haven't changed. It also allows the IDE to offer
       | different fixes and auto-complete functionality.
       | 
       | This makes it difficult to share parser logic between the two.
        
         | duped wrote:
         | It's only difficult to reuse logic from a batch compiler in a
         | responsive compiler. It's trivial to derive a batch compiler
         | from a responsive one.
         | 
         | You do not actually want to stop at the first error in either
         | case. You want to accumulate all the errors at a given phase of
         | the compilation and halt. Sometimes that allows other phases to
         | progress (for example, you do not want an error in one
         | compilation unit to halt the compilation of any other units
         | until linking in an incremental compiler).
         | 
         | You do actually want to reuse IRs in the IDE, otherwise it can
         | be extremely difficult to get certain things correct (and some
         | are next to impossible, like macro expansion/syntax extensions,
         | decompilation of libraries, etc).
         | 
         | Unification of the reference compiler and IDE backend are
         | extremely desirable, in my opinion. Very few languages take
         | that tact (C#/.NET being a major exception) but not because
         | it's a _bad_ design - it 's because it's _hard._ Writing a
         | lexer and parser is easy if you don 't care about edits and and
         | updates. And there are very few parser generators that do make
         | that possible (tree sitter being the major exception). And once
         | you have a working lexer/parser it is difficult to replace it
         | in your compiler, so few language devs ever take that approach.
         | 
         | It's essentially a massive engineering effort in something that
         | is rather boring for low payoff in the early life of a language
         | implementation, the RoI is only obvious much later when usage
         | scales up. So it's unsurprising many languages do not do it
         | early on, and like objects, most languages die young.
        
         | curun1r wrote:
         | > makes it difficult to share parser logic between the two
         | 
         | It's difficult to reuse traditional compiler logic in an IDE,
         | but there's good examples that the reverse isn't true. IDE
         | validation of language semantics is a strictly more complex
         | problem, but if you start by solving that, it's not as hard to
         | add a compiler backend. A compiler's job is to either take
         | correct code and translate to its compiled form or take
         | incorrect code and report errors. There's no reason an IDE-
         | focused parser/compiler can't do both.
         | 
         | IIRC, Microsoft talked publicly about how they built the C#
         | compiler as IDE-first and found that it simplified things
         | greatly. And I think there has been substantive discussions
         | within the Rust community about bringing parts of rust-analyzer
         | into the official compiler whereas the RLS approach of reusing
         | compiler APIs wasn't able to provide a reasonable IDE
         | experience.
        
         | phillipcarter wrote:
         | This isn't necessarily correct. Many modern compilers (e.g., C#
         | and F# compilers) will do error recovery and keep processing as
         | far as they can go, accumulating errors in the process. These
         | same compilers are not different from that used in the IDE -
         | they are one and the same. And finally, modern compilers can
         | also be tuned based on usage, such as enabling batch mode to
         | optimize for speed in a single thread or using as many threads
         | as are available to optimize for IDE scenarios.
        
         | nicoburns wrote:
         | Why don't you want to do the IDE-style parsing in a compiler?
        
           | kkdaemas wrote:
           | That can be desirable but there are a few challenges:
           | 
           | - The compiler code becomes more complicated, making
           | correctness harder
           | 
           | - The compiler might become slower to run
           | 
           | - Introducing new languages features may become harder, again
           | due to code complexity
        
             | phillipcarter wrote:
             | Compiler will get plenty complicated without IDE scenarios,
             | trust me on that one. Slowness is also never really a thing
             | to worry about here, especially because usage patterns in
             | an IDE vs. a batch process are so different. It's almost
             | always the other way around: someone writes something
             | that's completely fine for a batch process but tanks IDE
             | performance.
        
             | oauea wrote:
             | > The compiler code becomes more complicated, making
             | correctness harder
             | 
             | > Introducing new languages features may become harder,
             | again due to code complexity
             | 
             | It'll be written for IDEs anyway. Might as well reuse if
             | possible, right?
        
           | onei wrote:
           | I was a little curious about this too. It's contrary to what
           | I see in the Go and Rust compilers. My understanding was that
           | it's good to have a go at parsing all input if possible so
           | the end-user can batch fix mistakes, but it's unreasonable to
           | expect error checking in post-parsing steps to occur if there
           | are parse errors because the AST is almost certainly
           | incomplete.
        
           | mannykannot wrote:
           | From time to time, I see errors in IDE parsing. It's not a
           | big deal there, but it would be in a compiler or interpreter.
        
             | duped wrote:
             | What case would introduce a parsing error in an IDE that
             | isn't the case in a compiler?
        
               | mannykannot wrote:
               | I figure it is on account of the desirable situation you
               | describe in your other post not obtaining: in order to
               | satisfy the goals of the IDE, it attempts to go beyond
               | where the compiler parser would stop, as the compiler is
               | more of a batch than a responsive one, and sometimes the
               | IDE gets it wrong. As you say, batch to responsive is the
               | difficult way to go.
               | 
               | In addition, I suppose that there are people hard at work
               | applying ML in tools to help understand incomplete code
               | and mitigate the false positive problem of traditional
               | static analysis. I can imagine probabilistic parsing
               | being useful in this case, but not so much in compiling.
        
               | hombre_fatal wrote:
               | Bad language plugins in an IDE can show you this.
               | Sometimes I'll be using a niche language with someone's
               | side-project plugin that has some issues even though it's
               | correct, like when its file-formatter can't parse the
               | code and fails with an error even though it's valid code
               | for the compiler.
        
               | duped wrote:
               | If the plugins used the same parser as the compiler this
               | wouldn't be an issue?
        
               | [deleted]
        
       | shadowgovt wrote:
       | All of the points this article makes are good and true, but
       | something I've never, ever seen and am wondering why is a
       | compiler feedback loop that corrects the code with human input.
       | 
       | The compiler is smart enough to guess what variable I meant when
       | I misspell a variable. How come nobody's ever given me a tool to
       | close the loop and when the error is reported, confirm that I
       | want my source code edited to correct and correct it in place?
        
         | aardvark179 wrote:
         | Because the compiler probably doesn't have a stdin it can use
         | at that point so you'd need to thread a feedback mechanism
         | through the entire build process. You're better off building
         | that as a protocol that can be used by IDEs or LSP clients.
        
           | shadowgovt wrote:
           | Excellent point. I can't think of a protocol that supports it
           | either... Likely one exists and I haven't encountered it.
        
       | callmeal wrote:
       | Well that's how you end up with
       | https://github.com/mattdiamond/fuckitjs
       | Javascript Error Steamroller              FuckItJS uses state-of-
       | the-art technology to make sure your javascript code runs whether
       | your compiler likes it or not.
       | 
       | Technology                   Through a process known as Eval-
       | Rinse-Reload-And-Repeat, FuckItJS repeatedly compiles your code,
       | detecting errors and slicing those lines out of the script. To
       | survive such a violent process, FuckItJS reloads itself after
       | each iteration, allowing the onerror handler to catch every
       | single error in your terribly written code.
        
       | js2 wrote:
       | > Have you ever heard that phrase about how "Every happy family
       | is the same, but every unhappy family is unhappy in their own
       | way?"
       | 
       | This really deserves proper attribution: it's the opening
       | sentence of _Anna Karenina_ by Leo Tolstoy.
        
         | ouid wrote:
         | Tangent ahead. There's the sort of folksy saying "Don't argue
         | with an idiot, they'll bring you down to their level and beat
         | you with experience" (often attributed to Mark Twain, so I'm
         | leaving it anonymous), and I think that these sayings are
         | trying to get at the same thing. When someone believes
         | something wrong, it could literally be beacuse of any other
         | wrong belief that they have. Trying to untangle that structure
         | is intractible.
        
           | NateEag wrote:
           | > "Don't argue with an idiot, they'll bring you down to their
           | level and beat you with experience"
           | 
           | In a further tangent, I've long been fond of a mildly-related
           | idiom (whose source I do not know) which instructs the
           | listener
           | 
           | "Never wrestle with a pig. You both get dirty and the pig
           | likes it."
        
       | gpderetta wrote:
       | These days compilers provide fix-it hints that are very useful.i
       | find them especially convenient for misspellings and printf
       | format string errors, especially when coupled with on the fly
       | error checking.
        
       | neilv wrote:
       | I recall a classmate (who'd gotten the unobtainable new Apple
       | IIgs at the time) mentioning the compiler he was running giving
       | an error message, which he described as "I see you forgot a
       | semicolon; should I add it for you?" The UCSD Pascal we were
       | using on shared school IIe and II+ didn't.
       | 
       | Years later, when I was meeting with Tim Berners-Lee, he wanted
       | to see the doc for a Web-related Scheme library I had with me,
       | and he started speed-reading it in front of me. The doc had an
       | irreverent criticism I'd thrown in, about the practice of overly-
       | permissive parsers in Web browsers. In the days of dotcom gold
       | rush, when anyone who would spell "HTML programmer" was getting
       | truckloads of investment money dumped on them, I'd proposed a
       | very prominent angry red browser error indicator for Web pages
       | with invalid with HTML. I thought that having that could be a
       | source of shame, like the creator of it didn't know Web, and all
       | the people tossing around money blindly and not knowing who to
       | invest in might take that as one indicator. :) (Sir Tim later
       | gave a big talk endorsing Python for the Web, but he did
       | reference one of my arguments for why I was adopting Scheme at
       | the time.)
       | 
       | "Conservative in what you send, liberal in what you accept"
       | seemed a good default model for protocol interoperation,
       | especially in an environment of legacy systems and imperfectly-
       | specified protocols. But Web was new, and HTML was often being
       | handwritten, and having the Web browser silently accept invalid
       | and often ambiguous HTML without giving any indication it was
       | wrong _even during development_ seemed to create an unnecessary
       | mess.
       | 
       | I actually had to spend a chunk of last weekend dusting off some
       | code to handle that mess, because another open source developer
       | was still running into the mess:
       | https://www.neilvandyke.org/racket/html-parsing/#%28part._.H...
        
         | Animats wrote:
         | I liked that idea in the early days. I wanted to drop back to
         | default mode, with default fonts and layout, after displaying
         | the first error. But this was when HTML by itself mostly
         | defined the page layout.
         | 
         | The HTML5 spec has a long, detailed set of rules for
         | consistently parsing bad HTML. They're very funny to read. That
         | was the best anyone could do at that late date.
        
       | bwanab wrote:
       | My 2nd CS course at college was (I'm showing my age here) PL1
       | programming. The Watfiv compiler would correct obvious parse
       | errors. Often, this would lead to much more insidious and not-so-
       | obvious bugs down the line.
        
         | njacobs5074 wrote:
         | The Pascal compiler on our CDC Cyber (used by undergraduate
         | courses when I went to NYU) would do this, too. Yeah, was a
         | long time ago :)
        
           | julian55 wrote:
           | I also remember when compilers made more of an effort to fix
           | trivial errors. It was worth it when using a batch system and
           | you could only run a few compilations each day.
        
         | vincent-manis wrote:
         | I think you're thinking of Cornell's PL/C compiler (Watfiv was
         | for Fortran, and didn't have a lot of error-correction), circa
         | 1970ish. PL/C would famously convert
         | 
         | PTU LIST('Hello, world!
         | 
         | into a valid program (in fact, the claim was that it would
         | never fail to convert any string of text into a valid program).
         | 
         | PL/C made a lot of sense when short student programs were
         | entered on punched cards (and hence trivial typos were tedious
         | to correct) and batch turnaround times were measured in hours.
         | This makes much less sense when (a) editors can give us clues
         | about typos right away, e.g., by indenting in a surprising way,
         | and (b) compile times for short modules are very short.
        
       | mikerg87 wrote:
       | This was tried in the 80s with teaching Pascal compilers. The
       | compiler would fix a problem and issue a warning and continue.
       | However continue on with what was left and issue strange and
       | bewildering messages to the poor student.
       | 
       | Turbo Pascal at the time just stop at the first problem and the
       | student could focus on addressing that one and only one issue at
       | the time. Yes it was a game of whack-a-mole with syntax errors
       | but at least it was a straight forward process to getting
       | something to compile
        
         | agumonkey wrote:
         | > game of whack-a-mole with syntax errors
         | 
         | first time I ever "wrote" a program was hand copying one from
         | PC Magazine. Knowing nothing about pascal syntax nor semantics,
         | what you said describes that whole week of mine.
        
         | ufo wrote:
         | I think this is the main issue. Compilers have always been
         | trying to do some kind of syntax error recovery, to be able to
         | spot more than one syntax error at a time. However, these
         | heuristics are unfortunately fragile. It often ends up with
         | cascading syntax errors where you're better off ignoring the
         | errors after the first one. Not to mention that many error
         | recovery heuristics tend to skip over some statements, which
         | means they can't be used to "autocorrect" the program.
         | 
         | A while back I looked at how several languages implement this
         | and Pascal was actually one of the better ones. It is a very
         | hard problem...
        
         | ekidd wrote:
         | > Yes it was a game of whack-a-mole with syntax errors but at
         | least it was a straight forward process to getting something to
         | compile
         | 
         | It also helps that Turbo Pascal was an extremely fast compiler
         | for its time. So you could fix one error, re-rerun the
         | compiler, and get another error quickly.
        
       | DancesWTurtles wrote:
       | "If the computer knows I'm missing a semicolon here, why won't it
       | add it itself?"
       | 
       | The computer does not know that. The computer is being too smart.
       | And probably wrong.
        
       | cuteboy19 wrote:
       | I want to disagree with him, that sometimes there _is_ an
       | unambiguous way for the compiler to solve such errors. But the
       | road to hell is paved with good intentions, as anyone working
       | with MATLAB will have experienced firsthand. Instead of trying to
       | guess what the programmer meant, a compiler should just be a dumb
       | machine that does what we tell it to do.
        
       | caditinpiscinam wrote:
       | The risk analysis argument makes sense from a language-usage
       | perspective. I think there's a language design argument we can
       | make as well:
       | 
       | 1) if your language papers over a syntax error, then that error
       | is effectively just an alternative syntax
       | 
       | 2) alternative syntaxes make a language more complex
       | 
       | 3) complex languages take more work to implement and more work to
       | learn
        
         | DonHopkins wrote:
         | PHP has entered the chat.
        
         | goto11 wrote:
         | This is what happened with HTML. At first the syntax was pretty
         | simple, but the error-correcting parsing meant that invalid
         | syntax became widespread. In order to ensure compatibility
         | between implementations the spec ended up having to specify the
         | exact parsing of any form of incorrect HTML also. Now the
         | parsing algorithm of HTML is incredibly complex:
         | https://html.spec.whatwg.org/multipage/parsing.html
        
       | amelius wrote:
       | > Because, as smart as we compiler designers think we are, you,
       | dear programmer, know your program better than we do.
       | 
       | Copilot, however, begs to differ.
        
         | naniwaduni wrote:
         | There's a lot of hubris in ML!
        
       | mcculley wrote:
       | Some languages are more redundant than others. I remember at
       | least one Ada compiler that would make an assumption about what
       | you meant, correct the internal representation, and keep going.
       | It would fail with an error, but still allowed one to push
       | further through the compilation process. This was a big help
       | given how slow the compilers were.
        
       | Someone wrote:
       | I strongly disagree. Compilers should autocorrect "obvious" parse
       | errors. I also am not aware of any that doesn't.
       | 
       | What they shouldn't do is produce a binary based on their (smart
       | or stupid) guesses about the programmer's intention.
       | 
       | That allows you to compile, fix multiple typos, compile, instead
       | of compile, fix one typo, compile, fix the next typo, compile,
       | etc, _and_ prevents you from running a program that you didn't
       | write.
       | 
       | I am not aware of any compiler that doesn't do this, as it would
       | be extremely annoying to have a compiler give up at the first
       | error.
       | 
       | The search term to use is parser error recovery. It doesn't give
       | obviously great hits, though. Sample hits:
       | 
       | - https://www.geeksforgeeks.org/what-is-error-recovery/
       | 
       | - https://cs.adelaide.edu.au/~charles/lt/Lectures/07-ErrorReco...
       | 
       | - https://en.wikipedia.org/wiki/Burke-Fisher_error_repair
       | 
       | - https://en.wikipedia.org/wiki/LR_parser#Syntax_error_recover...
        
         | Retr0id wrote:
         | > I also am not aware of any that doesn't.
         | 
         | Can you give an example of CPython, or GCC/clang autocorrecting
         | a parse error?
        
           | borodi wrote:
           | They do auto correction while parsing. If you have a C code
           | and forget a semicolon or something else. It will try to
           | guess a way to fix it in order to parse the rest of the
           | program, in order to give you other possible errors it found.
           | There even is the -Wfatal-errors flag to disable this
           | functionality.
        
           | chrisseaton wrote:
           | Have you ever seen GCC or Clang give more than one error
           | message for a program? They're able to provide more than one
           | error message because they correct the first error, and then
           | continue parsing.
        
             | vincent-manis wrote:
             | For some value of "correct". Actually, it's often more like
             | skipping a few tokens until some kind of synchronization
             | point (e.g., a semicolon) is reached. It's good manners for
             | a compiler that does this to refuse to produce a binary.
        
           | zauguin wrote:
           | As a clang++ demonstration see
           | https://godbolt.org/z/bTMT6qd4f. The failing `static_assert`
           | uses the variable `i` which comes from the line missing a
           | semicolon. It only reaches this assert because the compiler
           | internally fixed the first error.
        
         | gpderetta wrote:
         | Not sure why you are being downvoted but you are completely
         | right.
         | 
         | A few years ago GCC wasn't as good at error recovery, so the
         | "too many errors, bailing out" message was a common occurrence
         | (code for Internal Compiler Error, but managed to print at
         | least one diagnostic). Today it is much much rarer to encounter
         | it and using the compiler is a much more pleasant experience.
        
         | phillipcarter wrote:
         | Unfortunately this isn't so simple. Autocorrecting to something
         | could result in picking something that's actually wrong and
         | impacting whole program semantics, either causing no errors
         | when there should be errors, or causing errors when there
         | should be none. That could be extremely confusing for new
         | developers.
         | 
         | Instead, compiler authors need to understand and prioritize
         | good ergonomics. Diagnostics should be accurate, come with
         | suggestions, have unique error codes you can look up, and
         | follow patterns you can predict over time.
        
         | syrrim wrote:
         | As long as compilation is instant, or nearly so, recompiling
         | after each error fixed is likely preferable. The reason being
         | that the error correction is frequently flawed, so that
         | anything beyond the first error is usually suspect, and often
         | will disappear after the first error is fixed. The novice
         | programmer may not realize this, and thus become quite confused
         | when they try to diagnose the later errors, only to realize
         | they don't exist.
        
       | seanwilson wrote:
       | > Because, as smart as we compiler designers think we are, you,
       | dear programmer, know your program better than we do.
       | 
       | When you compile a file successfully, make an edit, then
       | compilation fails, are there any compilers/IDEs that compare the
       | before-and-after of the file to create better error messages? The
       | compiler would have a lot of extra information this way because
       | files usually change gradually and not all at once.
       | 
       | I'm thinking of cases where you're refactoring some code but miss
       | out a bracket, the compiler says the missing bracket could be
       | anywhere down the bottom of the whole file but anyone watching
       | intuitively knows what block of code the missing bracket likely
       | falls into, usually localised around what you just edited.
       | 
       | > elsif say_goodbye nd we_like_this_person:
       | 
       | > If the compiler tried to automatically add a colon, I'd have
       | two colons and the code is even wronger.
       | 
       | Couldn't a smarter compiler guess that because
       | "we_like_this_person" and "say_goodbye" are defined variables and
       | there's no variables similar looking to "nd", that "nd" should
       | probably be the "and" keyword?
       | 
       | I'm surprised by how unhelpful error messages still are for most
       | tools. I'm curious how much this is because it's a very hard
       | problem rather than it's a neglected area that developers accept
       | as normal. I heard Elm is meant to be good here (where strong
       | static typing allows for certain kinds of hints): https://elm-
       | lang.org/news/compiler-errors-for-humans
        
         | MereInterest wrote:
         | > because files usually change gradually and not all at once.
         | 
         | As a counterexample, suppose you are checking out a new version
         | of a file, or a new branch with many changes across many files.
         | Identifying this usage would require the compiler to be aware
         | of the version control system, and still wouldn't correctly
         | identify that the version sent from $COWORKER via email for
         | some weird reason isn't a gradual change.
         | 
         | For me personally, debugging is difficult enough without
         | needing to worry that the compiler is going to maintain state
         | across multiple runs. If I see an error message that is
         | different at all, I assume that means I'm triggering a
         | different failure mode, and debug accordingly.
         | 
         | Edit: That said, the Rust compiler is tremendous with error
         | messages, without relying on time-dependent state. If a
         | variable is misspelled, it will look for similarly named
         | variables that are in scope, and ask if you meant one of them.
         | But this behavior is still consistent for a given file and
         | compiler version.
        
           | scythe wrote:
           | Couldn't you just gate the comparison with compiler flags and
           | define corresponding make targets? Flags could also identify
           | the VCS and how the old file should be obtained.
        
           | [deleted]
        
           | jancsika wrote:
           | > For me personally, debugging is difficult enough without
           | needing to worry that the compiler is going to maintain state
           | across multiple runs.
           | 
           | Ooh, the idea makes me shudder.
           | 
           | I remember looking at a project's makefile which called a
           | custom build script where the README said to run the
           | buildscript twice-- once to generate some state, and a second
           | time to compile stuff using that state.
           | 
           | Without any comments provided, the makefile called the custom
           | buildscript _three_ times in a row.
           | 
           | I can't even imagine the superstition and cargo culting that
           | would arise from an IDE "helping out" by analyzing who has
           | changed what, when, and in what order they changed it.
           | 
           |  _Please paste a new empty function named "momo" here before
           | doing a release. Also make sure your blinds are closed before
           | compiling._
        
         | avar wrote:
         | Seems relatively easy to implement by having your "make" step
         | commit to git, then on a compilation error show the diff alone
         | with the error.
         | 
         | Then simply rebase out the intermediate steps before pushing.
        
         | abecedarius wrote:
         | > compare the before-and-after of the file to create better
         | error messages?
         | 
         | There was a PhD thesis I read in the 90s that included a
         | version of this idea. I forget the specifics.
        
       | nneonneo wrote:
       | For Python specifically, the colon is always used to introduce a
       | block, which makes parsing somewhat easier as well as being
       | consistent throughout the language.
       | 
       | However, one could easily imagine a design where certain keywords
       | automatically introduce a block after the current line, which
       | would eliminate the need for the colon. It would prevent one-
       | liners (e.g. "if x: y") but that's no big loss. The colon would
       | continue to be used for e.g. lambda, dict and annotation syntax.
        
       | carlhjerpe wrote:
       | I'd like to be able to run the compiler in interactive mode where
       | it asks to apply the fix along with the diff.
        
       | kazinator wrote:
       | If you correct errors without failing the compilation due to a
       | nonzero error count, then you've essentially forked the language.
       | What is an error in the standard language is a _de facto_
       | nonconforming extension in your implementation of it, and the
       | users will be in for a nasty surprise when they try to port their
       | code to another implementation.
       | 
       | Compilers used to correct obvious parse errors a lot more than
       | they do now. The goal wasn't to make the program pass compilation
       | so that the user can ignore the error messages. That would be
       | harmful, as noted above. The goal is to be able to continue
       | processing the program and uncover more errors in it in a useful
       | way.
       | 
       | There is a gamble there:
       | 
       | - if you make a good correction to the token stream, all is well:
       | you can diagnose more errors later in a pertinent way.
       | 
       | - if the correction is wrong, then the compiler may emit a flurry
       | of nonsense errors which caused by the correct, so that only the
       | first diagnostic makes any sense.
       | 
       | There is a third risk:
       | 
       | - the correction may lead to looping. This risk exists in any
       | correction that lengthens the token sequence. The compiler may
       | have to quit when the error count reaches some defined maximum.
       | The looping may otherwise be infinite, or possibly unpredictable
       | in length (think Hailstone Sequence).
       | 
       | In the 1970's, _Creative Computing_ magazine conducted a contest
       | to see who could produce the most error messages using the least
       | amount of code.
       | 
       | The reason old time compilers tried to correct as many errors in
       | a single run is that the programmers didn't always have use of
       | the computer; they had to produce the program using keypunch
       | equipment onto punched cards, and then line up at a job
       | submission window, where an operator would submit their card deck
       | for execution. You wouldn't want to line up to fix one semicolon
       | at a time.
        
       | jokoon wrote:
       | Because no software is good at removing ambiguity. Only humans or
       | maybe AI are good at maybe detecting ambiguity and removing it.
       | It requires previous experiences, something computers cannot do
       | accurately enough. Programming languages are a bit like real
       | languages, they require human intelligence.
       | 
       | And there are still edge cases that could be ambiguous to humans,
       | so you definitely want any compiler to refuse ambiguous programs.
       | Computers are mathematical machines, they do everything without
       | asking for your permission, so you better pray their behavior is
       | well defined.
       | 
       | Look at what happens when language are ambiguous like javascript
       | or HTML: it becomes hard to use, and js engines are monsters you
       | don't want to understand how they work. I'm not a fan of C++ and
       | its difficulty, but it's my favorite language because it's well
       | defined.
       | 
       | Maybe compiler engineers may attempt to demonstrate how inserting
       | semicolons in some place could create undesirable situations.
       | Writing parsers is one of the toughest programming task, in my
       | view.
       | 
       | Rules in languages don't exist for nothing. Even duck typing has
       | a cost. It's like deciding that people can drive anywhere on the
       | road, and let people decide how to avoid each other. Sure they
       | can, and it would work 99% of the time, but 99% of the time is
       | not good enough.
        
       | dplgk wrote:
       | I don't think python will tell you you're missing a semi colon.
        
         | silon42 wrote:
         | I prefer an explicit pass.
        
       | hannob wrote:
       | This is somewhat related to the "robustness principle" that was
       | guiding internet development in the early days. It wasn't about
       | programming languages, but about protocol data, but the issue is
       | similar.
       | 
       | Yet it turned out that doing that introduces a lot of subtle
       | security issues. Today many people came to the conclusion that
       | the robustness principle was a mistake:
       | https://www.ietf.org/archive/id/draft-iab-protocol-maintenan...
        
       | eklavya wrote:
       | My understanding is that, if the compiler can with 100%
       | confidence correct a mistake or a missing label then that means
       | the language has a redundant/useless syntactical appendage and we
       | can just get rid of it. That has happened with ";", in some newer
       | languages, we no longer require those.
        
       | charcircuit wrote:
       | To the final example I don't think double colon would ever be
       | valid so why would the "compiler" suggest to make that change?
        
       | jpeanuts wrote:
       | A place where autocorrect might be considered is in REPLs. Out of
       | habit I still regularly write "print 'a'" in the Python REPL
       | although I've been using Python 3 for a while. You get:
       | 
       | SyntaxError: Missing parentheses in call to 'print'. Did you mean
       | print("a")?
       | 
       | Well yes... obviously... so please just print it.
        
         | DancesWTurtles wrote:
         | That would mean adding a rule to the language. The rule of "you
         | can use 'print' as an statement". A rule that, time after time,
         | has been shot down.
         | 
         | Which leads us to the real issue at hand: if the compiler is
         | going to do anything by itself, that means it is following well
         | defined rules. Therefore, whatever automatic thing the compiled
         | does is part of the language. And, sometimes, the design rules
         | of said language plain and simply do not allow for that.
        
         | goto11 wrote:
         | Be careful what you wish for. Parsers guessing at the authors
         | intention is what gave us HTML.
        
       | phendrenad2 wrote:
       | Okay so autocorrect is a bad idea. But what I'm tearing my hair
       | out trying to figure out is why can't compilers even _detect_ an
       | "obvious" error? Why does GCC often output cryptic general-
       | purpose errors about the WRONG line of code, when I have a simple
       | typo? Why can't it have a simple rules engine that detects common
       | typos in syntax and suggests a fix to me, right in the error
       | message?
        
         | phillipcarter wrote:
         | It can be hard to say, since things can get wildly complicated
         | depending on language semantics (especially when type inference
         | is involved). But yeah, there is often a tendency in compilers
         | to react very poorly to a typo and freak out over what comes
         | _after_ it instead of going,  "hmmm, this doesn't look
         | complete". In some languages, this can be because the typo is
         | actually ambiguous with something that's fine, just not in the
         | context of the code that follows. But every compiler and
         | language is different, so it's hard to say.
        
         | [deleted]
        
       | gbolcer wrote:
       | That is a great point. Eclipse in Java in the past 5+ years has
       | started auto completing beginning of expressions. As someone who
       | started programming in vi, it's really enjoyable. It doesn't do
       | everything, but strings, statement completions, loops and scopes,
       | even variable naming and selection, etc all autocomplete. I
       | realize that's not quite the same as autocorrect, but it works
       | well. Second, with the OpenAI GPT-3 stuff, it does generate
       | syntactically correct code, though I haven't learned to trust it
       | enough to generate semantic instantiations of what I'm thinking.
        
       | WalterBright wrote:
       | Language syntax is redundant so that compilers can detect errors.
       | If there was no redundancy in a language, every random stream of
       | bytes would be a valid program.
       | 
       | It's like having dual-path redundancy in an airplane avionics
       | system. If they disagree, then it is clear there's a fault in one
       | of them - but it doesn't mean you can tell _which_ one is
       | erroneous. Without redundancy, there 's no way to detect faulty
       | operation.
       | 
       | Guessing which parse is correct, or which avionics subsystem is
       | correct, is as bad as no redundancy at all.
        
         | hzhou321 wrote:
         | > Language syntax is redundant so that compilers can detect
         | errors.
         | 
         | Unfortunately, programming language syntax is not as redundant
         | as we would wish. When the code is in the valid syntax, the
         | compiler can parse it. When the code isn't in the valid syntax,
         | the compiler doesn't even have a valid base to parse any
         | information out of the code. The compiler writer may assume a
         | common cause for certain parsing error and insert some
         | "meaningful" error messages, but that is very different from
         | the compiler knows anything. The "redundant" information is
         | carried in the out-of-band channel (human vocabulary and common
         | patterns) rather than in the syntax.
        
           | WalterBright wrote:
           | If the code does not match the grammar for the language, that
           | is redundancy in the grammar.
           | 
           | The compiler can (and does, for error messages and error
           | recovery) guess at what was meant, but it cannot know what
           | was meant.
        
           | hzhou321 wrote:
           | I think you are thinking of "redundant" in terms of
           | information. A syntax does not carry any information on its
           | own. It is a frame for information to reside in.
        
       | ch_123 wrote:
       | An old-timer professor at my university used to tell the story of
       | the university's use of the PL/C compiler from Cornell[1] which
       | promised to automatically correct syntax errors in student's
       | code. This was back in the days of punched card and next-day
       | compilation times, and it was hoped that the PL/C compiler would
       | reduce the amount of compute time spent compiling bad code.
       | Instead, it would end up turning poorly thought-out code into
       | code which would crash the system or cause endless loops. Its use
       | was quickly discontinued after a short time using it.
       | 
       | [1] https://en.wikipedia.org/wiki/PL/C
        
       | hzhou321 wrote:
       | The compiler "guessed" your errors. This does not mean the
       | compiler "knows" your errors. A rephrase of your question is --
       | Why compilers don't always assume the "guessed" correction of
       | your code? This is because compiler doesn't know and you won't
       | know when it may guess wrong. And in the case of guessing wrong,
       | you will have very very mysterious errors to the best, and have
       | very very mysterious wrong application behavior (that doesn't
       | even fail) to the worst.
        
         | vincent-manis wrote:
         | Actually, when I was teaching, I used to see a student strategy
         | I called "obey the compiler", which was to fix whatever the
         | compiler complained about, without thinking. If the compiler
         | said "semicolon expected at col 42", the student would put a
         | semicolon at column 42. If the compiler complained "undeclared
         | identifier prnit_results", the student would declare a name
         | prnit_results. The strategy, when followed to extremes, could
         | convert an almost-valid program into a string of nonsense
         | (post-conversion was when the student came to me for help),
         | more or less the opposite of PL/C's strategy, which I mentioned
         | in an earlier comment. To be fair, this strategy was mostly
         | found in first-year, and a few weaker second-year students.
        
         | hzhou321 wrote:
         | I think Java Script compiler will assume its guesses for the
         | semicolon cases. Not every one is happy about it.
        
       ___________________________________________________________________
       (page generated 2022-04-10 23:01 UTC)