[HN Gopher] Why Compilers Don't Autocorrect "Obvious" Parse Errors
___________________________________________________________________
Why Compilers Don't Autocorrect "Obvious" Parse Errors
Author : skilled
Score : 50 points
Date : 2022-04-09 05:57 UTC (1 days ago)
(HTM) web link (chelseatroy.com)
(TXT) w3m dump (chelseatroy.com)
| kayodelycaon wrote:
| > [Javascript and to some degree ruby] will try with all their
| might to divine something runnable from what you wrote. How kind
| of them, right?
|
| Maybe this is semantics, but a loose syntax is different than the
| language trying to automatically correct mistakes.
|
| JavaScript has optional semi-colons and braces. The semicolons
| seem to fall into the into the autocorrecting category because
| you are supposed to use them. Optional braces are a language
| feature shared with C.
|
| Ruby has optional parentheses on method calls, which is usually
| fine until you attempt to do `a(b(4))` as `a b 4`. It's easy to
| get into a syntax error omitting writing code like that. But the
| fact it will give you a syntax error when it hits an unclear
| structure means this is a (mis-)feature, rather than an attempt
| as guessing what you meant.
| zzo38computer wrote:
| I really dislike the automatic semicolon insertion feature of
| JavaScript. (It is, in my opinion, one of the worst features of
| JavaScript.)
|
| (A preprocessor could be used to fix it if wanted, I suppose,
| but then it must be preprocessed and converted)
| mannykannot wrote:
| These two things are related in that autocorrection tends to
| create de-facto loose syntax. Once programmers become aware of
| it and it becomes part of the language as it is used, the
| language specification becomes more complicated - and possibly
| extremely so, as every edge case between what the current
| parser can and cannot correct correctly becomes part of that
| specification.
| zamadatix wrote:
| I've always thought of ASI as more "statements are separated
| for you but if you need them to be separated in a special way
| you can add semicolons to manually control separation behavior"
| than a "you forgot semicolons, let me fix your code for you".
| Pretty much the same as the argument for optional braces being
| a language feature not an auto-correction.
|
| That said when you look at either in terms of how they are
| implemented it'll seem like a correction feature. I think the
| real difference between auto-correction and optional syntax is
| simply whether or not the language spec designed it to be
| optional.
| pdpi wrote:
| Semicolon insertion in Javascript is nothing like braces.
|
| The grammar for e.g. an if statement is simple: *if (*
| _expression_ *)* _statement_ *else* _statement_. One
| particular value of _statement_ is a block statement, which
| is where the braces come from. Nothing more, nothing less.
|
| Inversely, the grammar specifically says that most statements
| (of types empty, expression, do-while, continue, break,
| return, throw, and debugger) must end with a semicolon, and
| ASI is explicitly described as a few cases where you're
| allowed to add an extra token to the token stream when the
| grammar refuses to accept the stream as-is.
| MatmaRex wrote:
| That is a sensible way to think about it, and it would be
| great if the language worked that way, but unfortunately it
| does not. Statements in JavaScript are not separated by line
| breaks.
|
| Here is an example to illustrate:
| console.log('a') (1 < 2) ? console.log('b') :
| console.log('c')
|
| You might expect this to output 'a', then 'b'. However, it
| instead outputs 'a' and then throws an error like this:
| Uncaught TypeError: console.log(...) is not a function
|
| ...because a semicolon was _not_ inserted at the end of the
| first line.
| plorkyeran wrote:
| Lua has very similar rules to JS regarding semicolons from
| the user's perspective: semicolons are optional, and only
| change a programs meaning in some very unusual edge cases.
| From a PL design perspective, they're fairly different,
| though. Lua's grammar doesn't have a statement terminator,
| and the language just lets you insert semicolons anywhere
| that doesn't appear inside a statement if you so wish. JS's
| grammar does have a statement terminator, but has rules for
| inferring it in some places when it's not present.
|
| Does this distinction matter in practice? Probably not. The
| more important different is probably just that JS has more
| unfortunate edge cases related to semicolons than Lua.
| gumby wrote:
| In the early 70s, Warren Tietelman (also inventor of Undo; his 67
| PhD thesis for Minsky was on what we would call an IDE today)
| developed a feature for Interlisp called DWIM, for Do What I
| Mean. It would figure out that you'd forgotten a paren or
| mistyped a function name and would rewrite your code for you.
|
| It was good the way autocorrect is good today, and I hated it,
| but you couldn't switch it off because it was also used for macro
| expansion!
| lispm wrote:
| Given that Interlisp is available, one might even try it out
| today.
|
| https://interlisp.org/
|
| The manual entry for DWIM:
|
| https://interlisp.org/IRM_files/content.htm#bookmark20
| williamstein wrote:
| The LaTeX compiler tries to correct many parse errors.
| jethkl wrote:
| As a longtime LaTeX user, I find this behavior irritating, not
| once has it saved me effort. I would much rather have LaTeX
| exit to shell immediately and to report a good error. Instead,
| the dynamic "fix" is applied in opaque manner, totally unclear
| what the compiler did, it optimizes for "patch the input so
| compilation runs to completion" instead of "do what the user
| wanted", and once LaTex has finished, the user still must
| manually edit the original file to get a proper solution.
| bee_rider wrote:
| This is the most annoying thing about LaTeX, IMO. The "wait,
| how many edits back was the error" game is no fun.
| imtringued wrote:
| The compiler is asking the developer to state his intent.
|
| The fact that some programming languages are overly pedantic is
| part of their design.
| rhdunn wrote:
| This is the difference between building a parser for a compiler
| and for an IDE.
|
| For a compiler, you want to stop at the first error. The parser
| can also emit an intermediate representation as it goes, so that
| what it is processing is not necessarily serializable to the
| original code. This makes it difficult to use as a data model for
| tools like IDEs.
|
| For an IDE, you want to process the entire file, recovering from
| errors as you go. This is so that the IDE can keep things like
| function resolution working without turning the entire file red
| as the user is typing code, while ideally only updating the
| references that haven't changed. It also allows the IDE to offer
| different fixes and auto-complete functionality.
|
| This makes it difficult to share parser logic between the two.
| duped wrote:
| It's only difficult to reuse logic from a batch compiler in a
| responsive compiler. It's trivial to derive a batch compiler
| from a responsive one.
|
| You do not actually want to stop at the first error in either
| case. You want to accumulate all the errors at a given phase of
| the compilation and halt. Sometimes that allows other phases to
| progress (for example, you do not want an error in one
| compilation unit to halt the compilation of any other units
| until linking in an incremental compiler).
|
| You do actually want to reuse IRs in the IDE, otherwise it can
| be extremely difficult to get certain things correct (and some
| are next to impossible, like macro expansion/syntax extensions,
| decompilation of libraries, etc).
|
| Unification of the reference compiler and IDE backend are
| extremely desirable, in my opinion. Very few languages take
| that tact (C#/.NET being a major exception) but not because
| it's a _bad_ design - it 's because it's _hard._ Writing a
| lexer and parser is easy if you don 't care about edits and and
| updates. And there are very few parser generators that do make
| that possible (tree sitter being the major exception). And once
| you have a working lexer/parser it is difficult to replace it
| in your compiler, so few language devs ever take that approach.
|
| It's essentially a massive engineering effort in something that
| is rather boring for low payoff in the early life of a language
| implementation, the RoI is only obvious much later when usage
| scales up. So it's unsurprising many languages do not do it
| early on, and like objects, most languages die young.
| curun1r wrote:
| > makes it difficult to share parser logic between the two
|
| It's difficult to reuse traditional compiler logic in an IDE,
| but there's good examples that the reverse isn't true. IDE
| validation of language semantics is a strictly more complex
| problem, but if you start by solving that, it's not as hard to
| add a compiler backend. A compiler's job is to either take
| correct code and translate to its compiled form or take
| incorrect code and report errors. There's no reason an IDE-
| focused parser/compiler can't do both.
|
| IIRC, Microsoft talked publicly about how they built the C#
| compiler as IDE-first and found that it simplified things
| greatly. And I think there has been substantive discussions
| within the Rust community about bringing parts of rust-analyzer
| into the official compiler whereas the RLS approach of reusing
| compiler APIs wasn't able to provide a reasonable IDE
| experience.
| phillipcarter wrote:
| This isn't necessarily correct. Many modern compilers (e.g., C#
| and F# compilers) will do error recovery and keep processing as
| far as they can go, accumulating errors in the process. These
| same compilers are not different from that used in the IDE -
| they are one and the same. And finally, modern compilers can
| also be tuned based on usage, such as enabling batch mode to
| optimize for speed in a single thread or using as many threads
| as are available to optimize for IDE scenarios.
| nicoburns wrote:
| Why don't you want to do the IDE-style parsing in a compiler?
| kkdaemas wrote:
| That can be desirable but there are a few challenges:
|
| - The compiler code becomes more complicated, making
| correctness harder
|
| - The compiler might become slower to run
|
| - Introducing new languages features may become harder, again
| due to code complexity
| phillipcarter wrote:
| Compiler will get plenty complicated without IDE scenarios,
| trust me on that one. Slowness is also never really a thing
| to worry about here, especially because usage patterns in
| an IDE vs. a batch process are so different. It's almost
| always the other way around: someone writes something
| that's completely fine for a batch process but tanks IDE
| performance.
| oauea wrote:
| > The compiler code becomes more complicated, making
| correctness harder
|
| > Introducing new languages features may become harder,
| again due to code complexity
|
| It'll be written for IDEs anyway. Might as well reuse if
| possible, right?
| onei wrote:
| I was a little curious about this too. It's contrary to what
| I see in the Go and Rust compilers. My understanding was that
| it's good to have a go at parsing all input if possible so
| the end-user can batch fix mistakes, but it's unreasonable to
| expect error checking in post-parsing steps to occur if there
| are parse errors because the AST is almost certainly
| incomplete.
| mannykannot wrote:
| From time to time, I see errors in IDE parsing. It's not a
| big deal there, but it would be in a compiler or interpreter.
| duped wrote:
| What case would introduce a parsing error in an IDE that
| isn't the case in a compiler?
| mannykannot wrote:
| I figure it is on account of the desirable situation you
| describe in your other post not obtaining: in order to
| satisfy the goals of the IDE, it attempts to go beyond
| where the compiler parser would stop, as the compiler is
| more of a batch than a responsive one, and sometimes the
| IDE gets it wrong. As you say, batch to responsive is the
| difficult way to go.
|
| In addition, I suppose that there are people hard at work
| applying ML in tools to help understand incomplete code
| and mitigate the false positive problem of traditional
| static analysis. I can imagine probabilistic parsing
| being useful in this case, but not so much in compiling.
| hombre_fatal wrote:
| Bad language plugins in an IDE can show you this.
| Sometimes I'll be using a niche language with someone's
| side-project plugin that has some issues even though it's
| correct, like when its file-formatter can't parse the
| code and fails with an error even though it's valid code
| for the compiler.
| duped wrote:
| If the plugins used the same parser as the compiler this
| wouldn't be an issue?
| [deleted]
| shadowgovt wrote:
| All of the points this article makes are good and true, but
| something I've never, ever seen and am wondering why is a
| compiler feedback loop that corrects the code with human input.
|
| The compiler is smart enough to guess what variable I meant when
| I misspell a variable. How come nobody's ever given me a tool to
| close the loop and when the error is reported, confirm that I
| want my source code edited to correct and correct it in place?
| aardvark179 wrote:
| Because the compiler probably doesn't have a stdin it can use
| at that point so you'd need to thread a feedback mechanism
| through the entire build process. You're better off building
| that as a protocol that can be used by IDEs or LSP clients.
| shadowgovt wrote:
| Excellent point. I can't think of a protocol that supports it
| either... Likely one exists and I haven't encountered it.
| callmeal wrote:
| Well that's how you end up with
| https://github.com/mattdiamond/fuckitjs
| Javascript Error Steamroller FuckItJS uses state-of-
| the-art technology to make sure your javascript code runs whether
| your compiler likes it or not.
|
| Technology Through a process known as Eval-
| Rinse-Reload-And-Repeat, FuckItJS repeatedly compiles your code,
| detecting errors and slicing those lines out of the script. To
| survive such a violent process, FuckItJS reloads itself after
| each iteration, allowing the onerror handler to catch every
| single error in your terribly written code.
| js2 wrote:
| > Have you ever heard that phrase about how "Every happy family
| is the same, but every unhappy family is unhappy in their own
| way?"
|
| This really deserves proper attribution: it's the opening
| sentence of _Anna Karenina_ by Leo Tolstoy.
| ouid wrote:
| Tangent ahead. There's the sort of folksy saying "Don't argue
| with an idiot, they'll bring you down to their level and beat
| you with experience" (often attributed to Mark Twain, so I'm
| leaving it anonymous), and I think that these sayings are
| trying to get at the same thing. When someone believes
| something wrong, it could literally be beacuse of any other
| wrong belief that they have. Trying to untangle that structure
| is intractible.
| NateEag wrote:
| > "Don't argue with an idiot, they'll bring you down to their
| level and beat you with experience"
|
| In a further tangent, I've long been fond of a mildly-related
| idiom (whose source I do not know) which instructs the
| listener
|
| "Never wrestle with a pig. You both get dirty and the pig
| likes it."
| gpderetta wrote:
| These days compilers provide fix-it hints that are very useful.i
| find them especially convenient for misspellings and printf
| format string errors, especially when coupled with on the fly
| error checking.
| neilv wrote:
| I recall a classmate (who'd gotten the unobtainable new Apple
| IIgs at the time) mentioning the compiler he was running giving
| an error message, which he described as "I see you forgot a
| semicolon; should I add it for you?" The UCSD Pascal we were
| using on shared school IIe and II+ didn't.
|
| Years later, when I was meeting with Tim Berners-Lee, he wanted
| to see the doc for a Web-related Scheme library I had with me,
| and he started speed-reading it in front of me. The doc had an
| irreverent criticism I'd thrown in, about the practice of overly-
| permissive parsers in Web browsers. In the days of dotcom gold
| rush, when anyone who would spell "HTML programmer" was getting
| truckloads of investment money dumped on them, I'd proposed a
| very prominent angry red browser error indicator for Web pages
| with invalid with HTML. I thought that having that could be a
| source of shame, like the creator of it didn't know Web, and all
| the people tossing around money blindly and not knowing who to
| invest in might take that as one indicator. :) (Sir Tim later
| gave a big talk endorsing Python for the Web, but he did
| reference one of my arguments for why I was adopting Scheme at
| the time.)
|
| "Conservative in what you send, liberal in what you accept"
| seemed a good default model for protocol interoperation,
| especially in an environment of legacy systems and imperfectly-
| specified protocols. But Web was new, and HTML was often being
| handwritten, and having the Web browser silently accept invalid
| and often ambiguous HTML without giving any indication it was
| wrong _even during development_ seemed to create an unnecessary
| mess.
|
| I actually had to spend a chunk of last weekend dusting off some
| code to handle that mess, because another open source developer
| was still running into the mess:
| https://www.neilvandyke.org/racket/html-parsing/#%28part._.H...
| Animats wrote:
| I liked that idea in the early days. I wanted to drop back to
| default mode, with default fonts and layout, after displaying
| the first error. But this was when HTML by itself mostly
| defined the page layout.
|
| The HTML5 spec has a long, detailed set of rules for
| consistently parsing bad HTML. They're very funny to read. That
| was the best anyone could do at that late date.
| bwanab wrote:
| My 2nd CS course at college was (I'm showing my age here) PL1
| programming. The Watfiv compiler would correct obvious parse
| errors. Often, this would lead to much more insidious and not-so-
| obvious bugs down the line.
| njacobs5074 wrote:
| The Pascal compiler on our CDC Cyber (used by undergraduate
| courses when I went to NYU) would do this, too. Yeah, was a
| long time ago :)
| julian55 wrote:
| I also remember when compilers made more of an effort to fix
| trivial errors. It was worth it when using a batch system and
| you could only run a few compilations each day.
| vincent-manis wrote:
| I think you're thinking of Cornell's PL/C compiler (Watfiv was
| for Fortran, and didn't have a lot of error-correction), circa
| 1970ish. PL/C would famously convert
|
| PTU LIST('Hello, world!
|
| into a valid program (in fact, the claim was that it would
| never fail to convert any string of text into a valid program).
|
| PL/C made a lot of sense when short student programs were
| entered on punched cards (and hence trivial typos were tedious
| to correct) and batch turnaround times were measured in hours.
| This makes much less sense when (a) editors can give us clues
| about typos right away, e.g., by indenting in a surprising way,
| and (b) compile times for short modules are very short.
| mikerg87 wrote:
| This was tried in the 80s with teaching Pascal compilers. The
| compiler would fix a problem and issue a warning and continue.
| However continue on with what was left and issue strange and
| bewildering messages to the poor student.
|
| Turbo Pascal at the time just stop at the first problem and the
| student could focus on addressing that one and only one issue at
| the time. Yes it was a game of whack-a-mole with syntax errors
| but at least it was a straight forward process to getting
| something to compile
| agumonkey wrote:
| > game of whack-a-mole with syntax errors
|
| first time I ever "wrote" a program was hand copying one from
| PC Magazine. Knowing nothing about pascal syntax nor semantics,
| what you said describes that whole week of mine.
| ufo wrote:
| I think this is the main issue. Compilers have always been
| trying to do some kind of syntax error recovery, to be able to
| spot more than one syntax error at a time. However, these
| heuristics are unfortunately fragile. It often ends up with
| cascading syntax errors where you're better off ignoring the
| errors after the first one. Not to mention that many error
| recovery heuristics tend to skip over some statements, which
| means they can't be used to "autocorrect" the program.
|
| A while back I looked at how several languages implement this
| and Pascal was actually one of the better ones. It is a very
| hard problem...
| ekidd wrote:
| > Yes it was a game of whack-a-mole with syntax errors but at
| least it was a straight forward process to getting something to
| compile
|
| It also helps that Turbo Pascal was an extremely fast compiler
| for its time. So you could fix one error, re-rerun the
| compiler, and get another error quickly.
| DancesWTurtles wrote:
| "If the computer knows I'm missing a semicolon here, why won't it
| add it itself?"
|
| The computer does not know that. The computer is being too smart.
| And probably wrong.
| cuteboy19 wrote:
| I want to disagree with him, that sometimes there _is_ an
| unambiguous way for the compiler to solve such errors. But the
| road to hell is paved with good intentions, as anyone working
| with MATLAB will have experienced firsthand. Instead of trying to
| guess what the programmer meant, a compiler should just be a dumb
| machine that does what we tell it to do.
| caditinpiscinam wrote:
| The risk analysis argument makes sense from a language-usage
| perspective. I think there's a language design argument we can
| make as well:
|
| 1) if your language papers over a syntax error, then that error
| is effectively just an alternative syntax
|
| 2) alternative syntaxes make a language more complex
|
| 3) complex languages take more work to implement and more work to
| learn
| DonHopkins wrote:
| PHP has entered the chat.
| goto11 wrote:
| This is what happened with HTML. At first the syntax was pretty
| simple, but the error-correcting parsing meant that invalid
| syntax became widespread. In order to ensure compatibility
| between implementations the spec ended up having to specify the
| exact parsing of any form of incorrect HTML also. Now the
| parsing algorithm of HTML is incredibly complex:
| https://html.spec.whatwg.org/multipage/parsing.html
| amelius wrote:
| > Because, as smart as we compiler designers think we are, you,
| dear programmer, know your program better than we do.
|
| Copilot, however, begs to differ.
| naniwaduni wrote:
| There's a lot of hubris in ML!
| mcculley wrote:
| Some languages are more redundant than others. I remember at
| least one Ada compiler that would make an assumption about what
| you meant, correct the internal representation, and keep going.
| It would fail with an error, but still allowed one to push
| further through the compilation process. This was a big help
| given how slow the compilers were.
| Someone wrote:
| I strongly disagree. Compilers should autocorrect "obvious" parse
| errors. I also am not aware of any that doesn't.
|
| What they shouldn't do is produce a binary based on their (smart
| or stupid) guesses about the programmer's intention.
|
| That allows you to compile, fix multiple typos, compile, instead
| of compile, fix one typo, compile, fix the next typo, compile,
| etc, _and_ prevents you from running a program that you didn't
| write.
|
| I am not aware of any compiler that doesn't do this, as it would
| be extremely annoying to have a compiler give up at the first
| error.
|
| The search term to use is parser error recovery. It doesn't give
| obviously great hits, though. Sample hits:
|
| - https://www.geeksforgeeks.org/what-is-error-recovery/
|
| - https://cs.adelaide.edu.au/~charles/lt/Lectures/07-ErrorReco...
|
| - https://en.wikipedia.org/wiki/Burke-Fisher_error_repair
|
| - https://en.wikipedia.org/wiki/LR_parser#Syntax_error_recover...
| Retr0id wrote:
| > I also am not aware of any that doesn't.
|
| Can you give an example of CPython, or GCC/clang autocorrecting
| a parse error?
| borodi wrote:
| They do auto correction while parsing. If you have a C code
| and forget a semicolon or something else. It will try to
| guess a way to fix it in order to parse the rest of the
| program, in order to give you other possible errors it found.
| There even is the -Wfatal-errors flag to disable this
| functionality.
| chrisseaton wrote:
| Have you ever seen GCC or Clang give more than one error
| message for a program? They're able to provide more than one
| error message because they correct the first error, and then
| continue parsing.
| vincent-manis wrote:
| For some value of "correct". Actually, it's often more like
| skipping a few tokens until some kind of synchronization
| point (e.g., a semicolon) is reached. It's good manners for
| a compiler that does this to refuse to produce a binary.
| zauguin wrote:
| As a clang++ demonstration see
| https://godbolt.org/z/bTMT6qd4f. The failing `static_assert`
| uses the variable `i` which comes from the line missing a
| semicolon. It only reaches this assert because the compiler
| internally fixed the first error.
| gpderetta wrote:
| Not sure why you are being downvoted but you are completely
| right.
|
| A few years ago GCC wasn't as good at error recovery, so the
| "too many errors, bailing out" message was a common occurrence
| (code for Internal Compiler Error, but managed to print at
| least one diagnostic). Today it is much much rarer to encounter
| it and using the compiler is a much more pleasant experience.
| phillipcarter wrote:
| Unfortunately this isn't so simple. Autocorrecting to something
| could result in picking something that's actually wrong and
| impacting whole program semantics, either causing no errors
| when there should be errors, or causing errors when there
| should be none. That could be extremely confusing for new
| developers.
|
| Instead, compiler authors need to understand and prioritize
| good ergonomics. Diagnostics should be accurate, come with
| suggestions, have unique error codes you can look up, and
| follow patterns you can predict over time.
| syrrim wrote:
| As long as compilation is instant, or nearly so, recompiling
| after each error fixed is likely preferable. The reason being
| that the error correction is frequently flawed, so that
| anything beyond the first error is usually suspect, and often
| will disappear after the first error is fixed. The novice
| programmer may not realize this, and thus become quite confused
| when they try to diagnose the later errors, only to realize
| they don't exist.
| seanwilson wrote:
| > Because, as smart as we compiler designers think we are, you,
| dear programmer, know your program better than we do.
|
| When you compile a file successfully, make an edit, then
| compilation fails, are there any compilers/IDEs that compare the
| before-and-after of the file to create better error messages? The
| compiler would have a lot of extra information this way because
| files usually change gradually and not all at once.
|
| I'm thinking of cases where you're refactoring some code but miss
| out a bracket, the compiler says the missing bracket could be
| anywhere down the bottom of the whole file but anyone watching
| intuitively knows what block of code the missing bracket likely
| falls into, usually localised around what you just edited.
|
| > elsif say_goodbye nd we_like_this_person:
|
| > If the compiler tried to automatically add a colon, I'd have
| two colons and the code is even wronger.
|
| Couldn't a smarter compiler guess that because
| "we_like_this_person" and "say_goodbye" are defined variables and
| there's no variables similar looking to "nd", that "nd" should
| probably be the "and" keyword?
|
| I'm surprised by how unhelpful error messages still are for most
| tools. I'm curious how much this is because it's a very hard
| problem rather than it's a neglected area that developers accept
| as normal. I heard Elm is meant to be good here (where strong
| static typing allows for certain kinds of hints): https://elm-
| lang.org/news/compiler-errors-for-humans
| MereInterest wrote:
| > because files usually change gradually and not all at once.
|
| As a counterexample, suppose you are checking out a new version
| of a file, or a new branch with many changes across many files.
| Identifying this usage would require the compiler to be aware
| of the version control system, and still wouldn't correctly
| identify that the version sent from $COWORKER via email for
| some weird reason isn't a gradual change.
|
| For me personally, debugging is difficult enough without
| needing to worry that the compiler is going to maintain state
| across multiple runs. If I see an error message that is
| different at all, I assume that means I'm triggering a
| different failure mode, and debug accordingly.
|
| Edit: That said, the Rust compiler is tremendous with error
| messages, without relying on time-dependent state. If a
| variable is misspelled, it will look for similarly named
| variables that are in scope, and ask if you meant one of them.
| But this behavior is still consistent for a given file and
| compiler version.
| scythe wrote:
| Couldn't you just gate the comparison with compiler flags and
| define corresponding make targets? Flags could also identify
| the VCS and how the old file should be obtained.
| [deleted]
| jancsika wrote:
| > For me personally, debugging is difficult enough without
| needing to worry that the compiler is going to maintain state
| across multiple runs.
|
| Ooh, the idea makes me shudder.
|
| I remember looking at a project's makefile which called a
| custom build script where the README said to run the
| buildscript twice-- once to generate some state, and a second
| time to compile stuff using that state.
|
| Without any comments provided, the makefile called the custom
| buildscript _three_ times in a row.
|
| I can't even imagine the superstition and cargo culting that
| would arise from an IDE "helping out" by analyzing who has
| changed what, when, and in what order they changed it.
|
| _Please paste a new empty function named "momo" here before
| doing a release. Also make sure your blinds are closed before
| compiling._
| avar wrote:
| Seems relatively easy to implement by having your "make" step
| commit to git, then on a compilation error show the diff alone
| with the error.
|
| Then simply rebase out the intermediate steps before pushing.
| abecedarius wrote:
| > compare the before-and-after of the file to create better
| error messages?
|
| There was a PhD thesis I read in the 90s that included a
| version of this idea. I forget the specifics.
| nneonneo wrote:
| For Python specifically, the colon is always used to introduce a
| block, which makes parsing somewhat easier as well as being
| consistent throughout the language.
|
| However, one could easily imagine a design where certain keywords
| automatically introduce a block after the current line, which
| would eliminate the need for the colon. It would prevent one-
| liners (e.g. "if x: y") but that's no big loss. The colon would
| continue to be used for e.g. lambda, dict and annotation syntax.
| carlhjerpe wrote:
| I'd like to be able to run the compiler in interactive mode where
| it asks to apply the fix along with the diff.
| kazinator wrote:
| If you correct errors without failing the compilation due to a
| nonzero error count, then you've essentially forked the language.
| What is an error in the standard language is a _de facto_
| nonconforming extension in your implementation of it, and the
| users will be in for a nasty surprise when they try to port their
| code to another implementation.
|
| Compilers used to correct obvious parse errors a lot more than
| they do now. The goal wasn't to make the program pass compilation
| so that the user can ignore the error messages. That would be
| harmful, as noted above. The goal is to be able to continue
| processing the program and uncover more errors in it in a useful
| way.
|
| There is a gamble there:
|
| - if you make a good correction to the token stream, all is well:
| you can diagnose more errors later in a pertinent way.
|
| - if the correction is wrong, then the compiler may emit a flurry
| of nonsense errors which caused by the correct, so that only the
| first diagnostic makes any sense.
|
| There is a third risk:
|
| - the correction may lead to looping. This risk exists in any
| correction that lengthens the token sequence. The compiler may
| have to quit when the error count reaches some defined maximum.
| The looping may otherwise be infinite, or possibly unpredictable
| in length (think Hailstone Sequence).
|
| In the 1970's, _Creative Computing_ magazine conducted a contest
| to see who could produce the most error messages using the least
| amount of code.
|
| The reason old time compilers tried to correct as many errors in
| a single run is that the programmers didn't always have use of
| the computer; they had to produce the program using keypunch
| equipment onto punched cards, and then line up at a job
| submission window, where an operator would submit their card deck
| for execution. You wouldn't want to line up to fix one semicolon
| at a time.
| jokoon wrote:
| Because no software is good at removing ambiguity. Only humans or
| maybe AI are good at maybe detecting ambiguity and removing it.
| It requires previous experiences, something computers cannot do
| accurately enough. Programming languages are a bit like real
| languages, they require human intelligence.
|
| And there are still edge cases that could be ambiguous to humans,
| so you definitely want any compiler to refuse ambiguous programs.
| Computers are mathematical machines, they do everything without
| asking for your permission, so you better pray their behavior is
| well defined.
|
| Look at what happens when language are ambiguous like javascript
| or HTML: it becomes hard to use, and js engines are monsters you
| don't want to understand how they work. I'm not a fan of C++ and
| its difficulty, but it's my favorite language because it's well
| defined.
|
| Maybe compiler engineers may attempt to demonstrate how inserting
| semicolons in some place could create undesirable situations.
| Writing parsers is one of the toughest programming task, in my
| view.
|
| Rules in languages don't exist for nothing. Even duck typing has
| a cost. It's like deciding that people can drive anywhere on the
| road, and let people decide how to avoid each other. Sure they
| can, and it would work 99% of the time, but 99% of the time is
| not good enough.
| dplgk wrote:
| I don't think python will tell you you're missing a semi colon.
| silon42 wrote:
| I prefer an explicit pass.
| hannob wrote:
| This is somewhat related to the "robustness principle" that was
| guiding internet development in the early days. It wasn't about
| programming languages, but about protocol data, but the issue is
| similar.
|
| Yet it turned out that doing that introduces a lot of subtle
| security issues. Today many people came to the conclusion that
| the robustness principle was a mistake:
| https://www.ietf.org/archive/id/draft-iab-protocol-maintenan...
| eklavya wrote:
| My understanding is that, if the compiler can with 100%
| confidence correct a mistake or a missing label then that means
| the language has a redundant/useless syntactical appendage and we
| can just get rid of it. That has happened with ";", in some newer
| languages, we no longer require those.
| charcircuit wrote:
| To the final example I don't think double colon would ever be
| valid so why would the "compiler" suggest to make that change?
| jpeanuts wrote:
| A place where autocorrect might be considered is in REPLs. Out of
| habit I still regularly write "print 'a'" in the Python REPL
| although I've been using Python 3 for a while. You get:
|
| SyntaxError: Missing parentheses in call to 'print'. Did you mean
| print("a")?
|
| Well yes... obviously... so please just print it.
| DancesWTurtles wrote:
| That would mean adding a rule to the language. The rule of "you
| can use 'print' as an statement". A rule that, time after time,
| has been shot down.
|
| Which leads us to the real issue at hand: if the compiler is
| going to do anything by itself, that means it is following well
| defined rules. Therefore, whatever automatic thing the compiled
| does is part of the language. And, sometimes, the design rules
| of said language plain and simply do not allow for that.
| goto11 wrote:
| Be careful what you wish for. Parsers guessing at the authors
| intention is what gave us HTML.
| phendrenad2 wrote:
| Okay so autocorrect is a bad idea. But what I'm tearing my hair
| out trying to figure out is why can't compilers even _detect_ an
| "obvious" error? Why does GCC often output cryptic general-
| purpose errors about the WRONG line of code, when I have a simple
| typo? Why can't it have a simple rules engine that detects common
| typos in syntax and suggests a fix to me, right in the error
| message?
| phillipcarter wrote:
| It can be hard to say, since things can get wildly complicated
| depending on language semantics (especially when type inference
| is involved). But yeah, there is often a tendency in compilers
| to react very poorly to a typo and freak out over what comes
| _after_ it instead of going, "hmmm, this doesn't look
| complete". In some languages, this can be because the typo is
| actually ambiguous with something that's fine, just not in the
| context of the code that follows. But every compiler and
| language is different, so it's hard to say.
| [deleted]
| gbolcer wrote:
| That is a great point. Eclipse in Java in the past 5+ years has
| started auto completing beginning of expressions. As someone who
| started programming in vi, it's really enjoyable. It doesn't do
| everything, but strings, statement completions, loops and scopes,
| even variable naming and selection, etc all autocomplete. I
| realize that's not quite the same as autocorrect, but it works
| well. Second, with the OpenAI GPT-3 stuff, it does generate
| syntactically correct code, though I haven't learned to trust it
| enough to generate semantic instantiations of what I'm thinking.
| WalterBright wrote:
| Language syntax is redundant so that compilers can detect errors.
| If there was no redundancy in a language, every random stream of
| bytes would be a valid program.
|
| It's like having dual-path redundancy in an airplane avionics
| system. If they disagree, then it is clear there's a fault in one
| of them - but it doesn't mean you can tell _which_ one is
| erroneous. Without redundancy, there 's no way to detect faulty
| operation.
|
| Guessing which parse is correct, or which avionics subsystem is
| correct, is as bad as no redundancy at all.
| hzhou321 wrote:
| > Language syntax is redundant so that compilers can detect
| errors.
|
| Unfortunately, programming language syntax is not as redundant
| as we would wish. When the code is in the valid syntax, the
| compiler can parse it. When the code isn't in the valid syntax,
| the compiler doesn't even have a valid base to parse any
| information out of the code. The compiler writer may assume a
| common cause for certain parsing error and insert some
| "meaningful" error messages, but that is very different from
| the compiler knows anything. The "redundant" information is
| carried in the out-of-band channel (human vocabulary and common
| patterns) rather than in the syntax.
| WalterBright wrote:
| If the code does not match the grammar for the language, that
| is redundancy in the grammar.
|
| The compiler can (and does, for error messages and error
| recovery) guess at what was meant, but it cannot know what
| was meant.
| hzhou321 wrote:
| I think you are thinking of "redundant" in terms of
| information. A syntax does not carry any information on its
| own. It is a frame for information to reside in.
| ch_123 wrote:
| An old-timer professor at my university used to tell the story of
| the university's use of the PL/C compiler from Cornell[1] which
| promised to automatically correct syntax errors in student's
| code. This was back in the days of punched card and next-day
| compilation times, and it was hoped that the PL/C compiler would
| reduce the amount of compute time spent compiling bad code.
| Instead, it would end up turning poorly thought-out code into
| code which would crash the system or cause endless loops. Its use
| was quickly discontinued after a short time using it.
|
| [1] https://en.wikipedia.org/wiki/PL/C
| hzhou321 wrote:
| The compiler "guessed" your errors. This does not mean the
| compiler "knows" your errors. A rephrase of your question is --
| Why compilers don't always assume the "guessed" correction of
| your code? This is because compiler doesn't know and you won't
| know when it may guess wrong. And in the case of guessing wrong,
| you will have very very mysterious errors to the best, and have
| very very mysterious wrong application behavior (that doesn't
| even fail) to the worst.
| vincent-manis wrote:
| Actually, when I was teaching, I used to see a student strategy
| I called "obey the compiler", which was to fix whatever the
| compiler complained about, without thinking. If the compiler
| said "semicolon expected at col 42", the student would put a
| semicolon at column 42. If the compiler complained "undeclared
| identifier prnit_results", the student would declare a name
| prnit_results. The strategy, when followed to extremes, could
| convert an almost-valid program into a string of nonsense
| (post-conversion was when the student came to me for help),
| more or less the opposite of PL/C's strategy, which I mentioned
| in an earlier comment. To be fair, this strategy was mostly
| found in first-year, and a few weaker second-year students.
| hzhou321 wrote:
| I think Java Script compiler will assume its guesses for the
| semicolon cases. Not every one is happy about it.
___________________________________________________________________
(page generated 2022-04-10 23:01 UTC)