[HN Gopher] C# Raw String Literal Proposal
       ___________________________________________________________________
        
       C# Raw String Literal Proposal
        
       Author : nikbackm
       Score  : 62 points
       Date   : 2022-02-17 13:26 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | ducharmdev wrote:
       | > This allows code to look natural, while still producing
       | literals that are desired, and avoiding runtime costs if this
       | required the use of specialized string manipulation routines.
       | 
       | Maybe a source generator would be applicable here? Not as nice
       | has having it built into the language of course, but it would at
       | least eliminate these runtime costs.
        
       | ape4 wrote:
       | Perl's heredoc is better. You get to pick the delimiter.
       | https://perlmaven.com/here-documents
        
       | DalekBaldwin wrote:
       | Interpolation is its own can of worms, but if you just want to be
       | able to encode absolutely anything without escape characters, you
       | just need two delimiters:                 "*n  [stringA]  "*n
       | "*n  [stringB]  '*n       '*n  [stringC]  "*n       '*n
       | [stringD]  '*n
       | 
       | Each string may contain runs of contiguous single or double
       | quotes of length less than n, and furthermore:
       | 
       | stringA may start and/or end with a single quote
       | 
       | stringB may start with a single quote and/or end with a double
       | quote
       | 
       | stringC may start with a double quote and/or end with a single
       | quote
       | 
       | stringD may start and/or end with a double quote
        
       | LandR wrote:
       | var v = """"""             contents"""""             """"""
       | 
       | lol.
       | 
       | I like the proposal to have these sort of raw strings, where
       | indentations are removed, but can't they use a symbol before the
       | string like they do with interpolation `$` or literals `@`?
       | 
       | I know it says design decision to go with 1 more " than the
       | longest sequence of " in the string, but why ?
        
         | brandmeyer wrote:
         | I'm a fan of the C++ method.                   R"SQL(my string
         | without the sequence S Q L goes here)SQL"
         | 
         | You can use any extra delimiter you want. The concatenation
         | rules make it easy for you to easily insert source code line
         | breaks and indentation without literal string breaks or
         | indentation.
        
           | Metasyntactic wrote:
           | Hi, I'm the lang designer here.
           | 
           | We looked into this. However, there didn't seem to be any
           | benefit to this above just the N-quote version (which fits
           | into how C# does strings everywhere else). In the above case,
           | the `SQL(` and `)SQL` tokens are just akin to N-quotes. Since
           | there's no additional benefit, we went with the simpler
           | approach that solves all these needs, but will look the same
           | across all codebases.
        
         | fmorel wrote:
         | It's more about not needing to escape characters than stripping
         | indentation (that's just an extra perk). Otherwise, if the
         | string can contain `"`, how can the compiler know which `"`
         | defines the end of it?
        
         | taco_emoji wrote:
         | They literally explain why as design goal #1:
         | 
         | > Provide a mechanism that will allow all string values to be
         | provided by the user without the need for any escape-sequences
         | whatsoever.
        
         | GordonS wrote:
         | I'm in agreement: the function is needed, but the syntax feels
         | really weird.
         | 
         | I'd also much prefer some kind of prefix, maybe double "at",
         | e.g.
         | 
         | ``` var myString = @@"blah blah blah blah " ```
         | 
         | This feels a lot more natural to me.
        
           | Someone wrote:
           | A design goal is that you won't need to escape _any_
           | character sequence in the string. In your proposal, a _"_
           | inside a literal string would have to be escaped.
           | 
           | In practice, using _" ""_ will be sufficient almost all the
           | time.
        
         | Metasyntactic wrote:
         | > but can't they use a symbol before the string like they do
         | with interpolation `$` or literals `@`?
         | 
         | Hi! I'm the language designer here :)
         | 
         | I know it says design decision to go with 1 more " than the
         | longest sequence of " in the string, but why ?
         | 
         | Because if we use a symbol before the string, then there needs
         | to be some mechanism to escape it within the string. e.g. if
         | you use `@` literals, you still need to escape quotes within
         | the string literal. The point of this feature (which we try to
         | spell out in the spec) is so that you can have content without
         | the need to escape anything at all.
        
         | mastax wrote:
         | Rust uses any number of `#` e.g.                   ###"There
         | can be "stuff without escapes" #" "# "###
         | 
         | C# already uses @ for raw string prefixes so they could extend
         | it's usage for multiple prefixes+suffixes.
         | 
         | """ Is fine though.
        
       | chubot wrote:
       | _To make the text easy to read and allow for indentation that
       | developers like in code, these string literals will naturally
       | remove the indentation specified on the last line when producing
       | the final literal value._
       | 
       | This is the same rule that Oil has; I think it came from the
       | Julia language (or at least that's where I got it from)
       | 
       |  _Oil Has Multi-line Commands and String Literals_
       | http://www.oilshell.org/blog/2021/09/multiline.html
        
       | StevePerkins wrote:
       | I like the three-quote literal syntax in general, and am a bit
       | surprised that C# doesn't have this already. Even Java has had
       | this for awhile now!
       | 
       | But I don't like the indented form, where nested triple-quotes
       | are ignored. Whitespace formatting is fine when I'm working with
       | Python, but I really don't want to mix that paradigm when I'm
       | working with curly-brace languages.
        
         | taco_emoji wrote:
         | Nested triple-quotes are not ignored, they will end the string
         | if you used triple to start. If you need to include triples
         | quotes within the string, you start with quadruple.
        
       | KingOfCoders wrote:
       | With all the hacks and exploits going on, and (Log4J) more
       | security awareness coming,
       | 
       | I've been considering a safe String class that prevents some
       | characters like CR,LF,\ that are seldom needed in business
       | strings but used in system level things. Drawing a line between
       | these two would increase security.
        
         | bdamm wrote:
         | safe would be more a property of where it came from, and maybe
         | what processing the string has had on it. Plenty of business
         | strings have newlines.
         | 
         | I like the idea but really I want the type to capture
         | unsafe/semi-safe/safe. Now the trick is, how expressive can the
         | idea of semi-safe be?
        
           | KingOfCoders wrote:
           | Joel tried to fix the problem / source with naming (I
           | remember reading that article, and it's 17 years old!)
           | 
           | https://www.joelonsoftware.com/2005/05/11/making-wrong-
           | code-...
        
         | ajnin wrote:
         | I think it's a good approach but I would do it the opposite way
         | : create an UnsafeString and use that whenever your program
         | takes external inputs. Then make it so that this UnsafeString
         | can't be used directly but must always be consciously converted
         | to (safe) String when using that data anywhere.
        
       | __ryan__ wrote:
       | I've given some thought to this kind of string literal in the
       | past (for my imaginary programming language). I want a syntax
       | _something_ like this:                 var xml = """<element
       | attr="content">                 """    <body>                 """
       | </body>                 """</element>                 """;
       | 
       | This would give you the string:                 <element
       | attr="content">           <body>           </body>
       | </element>(no newline)
       | 
       | If you wanted a newline at the end, you'd do this:
       | var xml = """<element attr="content">                 """
       | <body>                 """    </body>
       | """</element>                 """                 """;
       | 
       | Basically the end delimiter of the string would be the last """.
       | You could concatenate two strings like so:                 var
       | xml = """<element attr="content">                 """    <body>
       | """    </body>                 """</element>                 """
       | """ // this string ended on this line                 +
       | """<element attr="content">                 """    <body>
       | """    </body>                 """</element>                 """
       | """; // this string ended on this line
       | 
       | This could use the same logic for using _at least_ three quotes
       | as the indicator that it 's a multiline string.
       | 
       | Please, tear this apart and offer improvements.
       | 
       | Edit: this is conceptually similar to Zig's multiline literal:
       | https://ziglang.org/documentation/master/#Multiline-String-L...
        
         | wangweij wrote:
         | One benefit of multiline raw string is that you can directly
         | copy/paste a block of characters between the program and its
         | source. Unless there are very sophisticated IDE support this
         | proposal does not work fine in this sense.
        
           | jcelerier wrote:
           | ... which text editor does not have multi-line block
           | selection these days ?
        
             | pjob wrote:
             | The browser that I'm using to read this proposal, for one.
        
           | __ryan__ wrote:
           | Ah, that's interesting. In general, yes my proposal does
           | benefit from and assume some IDE/tooling support for quality
           | of life.
           | 
           | Edit: To be specific, the IDE could handle formatting when
           | pasting into a line beginning with """. Or offer a "paste as
           | _cool new multiline string syntax_ " feature.
        
             | [deleted]
        
         | skrebbel wrote:
         | I don't understand what problem this solves that the proposal
         | in the linked article doesn't solve. Maybe you're not
         | suggesting that it does, but then what's the upside?
        
           | __ryan__ wrote:
           | Oh yeah, I was just sharing my string literal syntax that's
           | been baking in my head for while for the sake of discussing.
           | 
           | But off the top of my head, mainly just that there's a clear
           | visual indicator of the start of lines of text, rather than
           | counting/lining up leading whitespace. In the first example,
           | the strings are all tabbed evenly for the sake of looking
           | "pretty" in the code, but the following would generate the
           | same string, since each line begins after the """:
           | var xml = """<element attr="content">                     """
           | <body>             """    </body>         """</element>
           | """;
        
             | Metasyntactic wrote:
             | Hi. I'm the language designer behind this feature :)
             | 
             | A few points.
             | 
             | > but the following would generate the same string, since
             | each line begins after the """:
             | 
             | That's not a virtue here. The point is to be able to write
             | clear literals that never need escapes and which allow for
             | easy grokking of what the content actually is.
             | 
             | All current string forms in C# require some amount of
             | manual (or tooling) help to fix them up to be legal. That's
             | not the case with this literal. The content can always work
             | as-is without having to touch it at all.
        
               | __ryan__ wrote:
               | That definitely makes sense. In my (limited, especially
               | not C#) experience I just really dislike reasoning about
               | trimming the leading whitespace, even if the rules are
               | simple. I yearn for a consistent visual cue.
               | 
               | In my syntax, the IDE would ideally treat the """ block
               | virtually like a <textarea>.
        
               | Metasyntactic wrote:
               | > I just really dislike reasoning about trimming the
               | leading whitespace
               | 
               | Note: this feature is entirely optional. You can
               | absolutely not have leading whitespace trimming at all.
               | Indeed, this is a requirement of the proposal as we have
               | to make it possible to actually represent text that has
               | leading whitespace :)
        
       | hankchinaski wrote:
       | I have been temporarily working with c# for the past month after
       | years of Go. It's a different philosophy to Go, there is a lot of
       | syntactic sugar and magic spells that make your life easy... but
       | I don't know if I prefer that to the Go way of doing things. I
       | was pleasantly surprised tho. Much better experience than working
       | with Java
        
         | radicalbyte wrote:
         | I've had to go the other way around.. and I find Go extremely
         | verbose compared to C# - as long as you're not forced to follow
         | certain constraints (SonarCube-driven-development).
        
           | StevePerkins wrote:
           | It sounds like you're both saying the same thing.
        
             | radicalbyte wrote:
             | Exactly, we've come from opposite sides and reached the
             | same conclusion :-)
        
         | dustymcp wrote:
         | Dotnet core is great the earlier versions not so much
        
       | monadmoproblems wrote:
       | I've long wanted a more succinct way of writing implicitly typed
       | arrays. Whenever you work with data directly in the code, for
       | example when hacking on leetcode, you end you with lots of
       | horrible nested arrays:
       | 
       | new [] {new [] {1, 2}, new [] {3, 4}};
       | 
       | Something like:
       | 
       | @[ @[1, 2], @[3, 4] ]
        
         | Metasyntactic wrote:
         | Hi there! I'm one of the C# language designers. I'm working on
         | a proposal for that right now:
         | https://github.com/dotnet/csharplang/issues/5354
         | 
         | Thanks!
        
       | chhickman wrote:
       | I would much rather see something like this:
       | string longString =                         `This
       | `  allows                         `    differentiation
       | `  of                         `formatting
       | `  indention                         `    from
       | `  leading                         `    string
       | `       spaces                         `  using
       | `  back-ticks (\`)
        
         | Metasyntactic wrote:
         | Hi there, I'm the lang designer and implementor here.
         | 
         | That would violate a core goal of the feature which is that the
         | content itself doesn't need escaping. This sort of approach
         | would require all users to have tooling that would make that
         | pleasant, instead of providing a feature that was easy to use
         | across any editor.
         | 
         | Thanks!
        
       | jimworm wrote:
       | Heredoc by another name...?
        
         | LandR wrote:
         | Doesn't heredoc preserve the indentation unless you strip it
         | back out ? I don't mean identation within the string, I mean
         | level of indentation of where it is in the code
        
           | sandreas wrote:
           | You can use it without indentication using <<- AND TAB (does
           | not work with spaces, so copy and past won't work on
           | hackernews - replace the spaces of the three content lines
           | with a TAB):                 cat <<-EOF        content
           | not        indented        EOF
        
           | jimworm wrote:
           | The perl-family heredoc syntax could be quite flexible with
           | options. I'm used to ruby which does have an option for
           | indent-stripping.
        
       | torginus wrote:
       | It seems like .NET is trying to outcompete Rust in the number of
       | string types available in the language.
        
         | ok123456 wrote:
         | Rust's different strings are different types with different
         | underlying memory semantics and representations.
         | 
         | This is just some syntactic sugar for strings that contain
         | escape codes. It's still just a 'string'.
        
           | tialaramex wrote:
           | Also, the Rust language itself only has one string type, str.
           | std::string::String comes ultimately from Rust's alloc crate,
           | it is special only in the limited sense that the prelude
           | makes it available without specifically asking for it, but
           | you could define your own prelude that introduces say MyText
           | or CPlusPlusStyleString or whatever you wanted.
           | 
           | Admittedly having one string type is still more than C, or
           | indeed C++ bother with but we might notice that those
           | languages have a pretty terrible relationship with strings
           | and suspect that's not a coincidence.
        
         | sbelskie wrote:
         | With potentially more on the way!
         | 
         | https://github.com/dotnet/csharplang/blob/main/proposals/utf...
        
         | doodpants wrote:
         | These aren't different string types, they're different syntaxes
         | for string literals.
        
       | radicalbyte wrote:
       | The issue is being discussed here:
       | https://github.com/dotnet/csharplang/issues/4304
        
       | jsd1982 wrote:
       | For the single-line case, what happens for:
       | """""""         """"""""
       | 
       | Are those strings containing `"` and `""` or are they empty
       | strings? Is the first case an error because the starting and
       | ending quote counts do not match? If the number of quote chars is
       | even, do the contents alternate between `"` and empty as the
       | number of surrounding quotes increases?
        
         | rawling wrote:
         | It does say
         | 
         | > A single_line_raw_string_literal cannot represent a string
         | value that starts or ends with a quote (") though an
         | augmentation to this proposal is provided in the Drawbacks
         | section that shows how that could be supported.
         | 
         | so I'd assume the odd count would lead to an error (single
         | trailing "?) rather than a string containing ".
         | 
         | E: I'd assume it doesn't allow empty single line strings
         | because otherwise how do you tell the difference between that
         | and the start of a multi line one?
        
       | exyi wrote:
       | I hope it will also normalize newlines to `\n`. The current
       | version of raw literals (@"...") just puts there whatever is in
       | the file, so it in practice depends on if your program was
       | compiled on Windows or Linux. Surely that should be irrelevant
       | for the compilation to intermediate language
        
         | Metasyntactic wrote:
         | Hi, i'm the lang designer and implementor here.
         | 
         | We absolutely do not normalize newlines as that would defeat
         | the purpose of _raw_ literals. The point here is that your
         | content is not interpreted as that 's the pain area that people
         | are hitting today. How you write your literal is what you get
         | at the end of the day.
         | 
         | Note: if the content needs to be `\n` then just use that actual
         | newline in teh code. WRT to the file line endings and whatnot,
         | my recommendation is that you never use tools that arbitrarily
         | change that behind your back as it does _already_ have impact
         | _today_ in C#. For example, that will break standard `@ ""`
         | strings today.
         | 
         | If your line endings are important, then your tools should be
         | setup to respect what you wrote and not change them. All
         | editors can be setup this way, as can git. And that would
         | absolutely be my recommendation on how you should structure
         | things for your code if newlines are relevant.
        
         | gpderetta wrote:
         | Surely it depends on the encoding of the source, not where it
         | was compiled?
        
           | exyi wrote:
           | Yes it does, but usually \n is committed in git, but on
           | Windows it checks out as \r\n. So you are right that it
           | technically does not depend on the system, but in practice
           | there is a difference.
        
             | gpderetta wrote:
             | That's a very good point. Is it best practice to configure
             | git to change line endings on Windows? I understand that
             | these days Windows editors can handle unix line terminators
             | corerctly.
        
           | rawling wrote:
           | I think you're right:
           | 
           | > Any line breaks within verbatim string literals are part of
           | the resulting string. If the exact characters used to form
           | line breaks are semantically relevant to an application, any
           | tools that translate line breaks in source code to different
           | formats (between "\n" and "\r\n", for example) will change
           | application behavior.
           | 
           | https://docs.microsoft.com/en-us/dotnet/csharp/language-
           | refe...
        
       | captainmuon wrote:
       | Nice, I'm surprized this is not already in the language. The only
       | thing I find a bit strange is that the delimiters must be on
       | separate lines (unless it is the special one-line-form). So this
       | is apparantly not legal:                   var s = """This is a
       | multiline string""";
       | 
       | Requiring the start and especially end quotes to be on a separate
       | line makes it take a lot of vertical space. But OTOH, that is
       | consistent with the default coding style in C# which is
       | vertically verbose (with {} on lines by themselves).
        
         | Someone1234 wrote:
         | > I'm surprized this is not already in the language.
         | 
         | Because it is already in the language.                    var
         | xml = @"               <element attr=""content"">
         | <body>                    </body>               </element>";
         | 
         | This proposal mostly seems to be about some edge case where @"
         | " syntax isn't good enough. But really, this whole thing is an
         | improvement to an anti-pattern, and you should instead be
         | looking into not needing multi-line block specific string
         | literals in your code (e.g. putting templates in their own
         | files/resources).
        
           | tialaramex wrote:
           | Microsoft's "verbatim strings" aren't. Think of them instead
           | as "Oops, we use a lot of backslashes here at Microsoft and
           | over time that just looks more and more stupid" strings and
           | then these actual raw strings make lots more sense than those
           | did.
           | 
           | There's plenty of stuff in a middle ground where a separate
           | template file is a waste. Your example, now that you've
           | corrected it shows that nicely. A separate template would be
           | a waste here for these few bytes, and yet "verbatim strings"
           | mean instead of this just being some actual XML you can copy-
           | paste it has to be escaped / unescaped.
           | 
           | If your issue is that you don't think string literals should
           | be a thing at all, C# is the wrong language for you. Try one
           | of the early numeric langauges, or something modern like
           | WUFFS that eschews strings entirely because they're too
           | dangerous. Once you accept that literals should be a thing
           | (notice these aren't interpolated, they're just literals)
           | this is an obvious idea.
           | 
           | The bad "verbatim" syntax should go away in favour of a raw
           | literal syntax such as the one proposed here.
        
           | maybeOneDay wrote:
           | You've perfectly demonstrated one motivation for this
           | proposal: your string literal is incorrect. Verbatim strings
           | in C# require " to be escaped, your string should be:
           | var xml = @"             <element attr=""content"">
           | <body>               </body>             </element>";
        
             | Someone1234 wrote:
             | All it perfectly demonstrates is that this is inherently an
             | anti-pattern and that we're discussing features to work
             | around things you shouldn't be doing to begin with.
             | 
             | If you want to store XML literals, then by all means do so,
             | but within the code itself is inappropriate. Even the
             | existing @" " syntax is a code-smell, the new syntax
             | doesn't address why that is (e.g.
             | validation/colorization/etc don't work for string literals
             | containing arbitrary other languages).
             | 
             | .Net already has constructs to allow the dynamic creation
             | of XML blocks (and JSON) without resorting to string-comcat
             | shenanigans.
        
               | maybeOneDay wrote:
               | Anti-patterns are rarely as absolute as you're making
               | this out to be. Sure, I agree, lots of times it's better
               | to store xml or json literals not in code. But for
               | something three lines long it's perfectly fine, more
               | readable, and trivial. This new proposal makes it elegant
               | to do so, the only issue is that the @"" syntax should
               | never have been used and unfortunately now we are
               | proposing a third string literal syntax. That I don't
               | like.
        
         | Metasyntactic wrote:
         | Hi. I'm the designer of this feature. The reason for this is so
         | that we can potentially have fenced string blocks in the
         | future. for example:
         | 
         | ```c# var s = """xml <Book><title/></Book> """; ```
         | 
         | and the like. Thanks!
        
       | assbuttbuttass wrote:
       | I always found these complex indentation-stripping rules to be
       | confusing in a language such as Python. I was under the
       | impression that C# doesn't treat whitespace significantly, so why
       | do they need all these complex rules? Just interpret what's
       | between the quotes literally.
        
         | Deukhoofd wrote:
         | When you have multiline strings, and don't want to start the
         | lines with whitespace, it means you need to break the
         | indentation of your code, which can look quite ugly. I'm not
         | sure whether I want the compiler to do it like in the proposal
         | though, I feel like it can easily cause unintentional issues.
        
           | assbuttbuttass wrote:
           | I think breaking the indentation is the lesser of two evils,
           | because it becomes very obvious what the content of the
           | string is. I've never minded it too much personally in go,
           | but I guess it's subjective.
        
           | LandR wrote:
           | When I have strings like this, currently I find myself doing
           | something like                   var foo = "<foo>" +
           | Environment.NewLine +                   "    <bar>" +
           | Environment.NewLine +                   "    <quax>" +
           | Environment.NewLine +                   "</foo>";
           | 
           | This would fix stuff like this nicely, but it's horrible
           | syntax I think.
        
             | Deukhoofd wrote:
             | That would also not be a compile time constant, which would
             | probably be desired in cases like that.
        
               | tremon wrote:
               | Why would it not be a compile-time contant? Can't every
               | compiler worth its salt evaluate constant expressions at
               | compile time nowadays?
        
               | Deukhoofd wrote:
               | Environment.NewLine depends on the machine it's ran in,
               | not on the compiler, so at best it can only evaluate it
               | at runtime.
        
         | LandR wrote:
         | > C# doesn't treat whitespace as signficant
         | 
         | It doesn't for code, it does for strings.
         | 
         | If you have:                   var foo = "<foo>
         | <bar>                        <quax>                    </foo>";
         | 
         | That string is actually                   <foo>
         | <bar>                        <quax>                    </foo>
         | 
         | In raw string form, as per the proposal, the string would be:
         | <foo>             <bar>             <quax>         </foo>
         | 
         | It allows you to write strings nicely inline, and the
         | indentation in the string itself doesn't matter.
        
       | a9h74j wrote:
       | In place of all of the special rules for handling indentation, I
       | wonder if they could simply define some extra starting chars
       | (besides $$""" for controlling interpolation) to indicate
       | suppress-leading-newline or suppress-ending-newline etc. Offhand
       | this would seem more explicit than implicit, and be searchable
       | (unlike a pattern).
       | 
       | Other than that, ++ for any mechanism to quote to arbitrary
       | depth. I have imagined
       | 
       | [abcfoo[ ...anything but ]abcfoo]... ]abcfoo]
       | 
       | as another approach.
        
       | crispyambulance wrote:
       | I got confused at the first example.                   var xml =
       | """                   <element attr="content">
       | <body>                     </body>                   </element>
       | """;
       | 
       | And then they say that xml gets this...
       | <element attr="content">           <body>           </body>
       | </element>
       | 
       | But they _don 't_ explicitly say if the new lines after and
       | before the """ 's are considered part of the literal string or
       | not.
       | 
       | Are they?
        
         | rawling wrote:
         | No, although you're right, it doesn't look like they make it
         | clear (although the example is in a little code element which
         | presumably doesn't have leading or trailing newlines).
         | 
         | Later on,
         | 
         | > In the case of multi_line_raw_string_literal the initial
         | whitespace* new_line and the final new_line whitespace* is not
         | part of the value of the string.
        
         | sbelskie wrote:
         | The construct seems to be defined as:
         | 
         | ``` multi_line_raw_string_literal :
         | raw_string_literal_delimiter whitespace* new_line (raw_content
         | | new_line)* new_line whitespace* raw_string_literal_delimiter
         | ; ```
         | 
         | Which I think says that the opening and closing new lines
         | (after and before the """'s) are NOT part of the content of the
         | string literal, but new lines between them can be part of the
         | string literal.
        
         | Metasyntactic wrote:
         | Hi. I'm the designer of this lang feature. The specification
         | covers this. However, to be clear, neither new line after the
         | first `"""` is not part of the literal, nor is the newline
         | before the last `"""`. Thanks!
        
         | laurensr wrote:
         | In Java the leading [1] newlines are stripped as well as the
         | trailing ones [2].
         | 
         | [1]:
         | https://cr.openjdk.java.net/~jlaskey/Strings/TextBlocksGuide...
         | 
         | [2]:
         | https://cr.openjdk.java.net/~jlaskey/Strings/TextBlocksGuide...
        
       | billpg wrote:
       | A little while ago, I discovered that...
       | $"A{new List<string>{$"B{"{C}"}D"}.First()}E"
       | 
       | ... was valid C#.
       | 
       | This means that a C# compiler can't start with a simple
       | tokenizing loop. That compiler phase would have to keep track of
       | state in a stack, recording what each } character means while its
       | still looping through code character-by-character.
       | 
       | Now we're adding {{ and }} into the equation. Yay.
        
         | torginus wrote:
         | >This means that a C# compiler can't start with a simple
         | tokenizing loop.
         | 
         | Not true, it just means that the parts between double quotes
         | aren't bunched into a single token.
         | 
         | To figure out how the compiler makes sense of this code, try
         | 
         | https://roslynquoter.azurewebsites.net/
         | 
         | You'll see how it tokenizes the string.
        
           | billpg wrote:
           | That's not a _simple_ tokenizer.
           | 
           | A simple loop that would have worked with the 70s era of
           | programming languages, would go through each character and
           | once the boundary between two tokens has been identified,
           | write out a token to a one-dimensional list. This would be a
           | mostly stateless loop, tracking enough state for the current
           | token in hand only. The _next_ phase would go through the
           | tokens and pair up brackets, etc.
           | 
           | A C# tokenizer can't do that. It needs to keep a stack of
           | state. When it sees a '}', it needs to know if that's a
           | "normal" brace or the } that resumes a interpolated string
           | literal.
           | 
           | I was writing a tokenizer myself and I wanted to have
           | something similar to string interpolation. I very quickly
           | realized my simple loop that I would have written for my CS
           | degree isn't going to cut it and I had to start over.
        
             | Metasyntactic wrote:
             | Hi, I'm one of teh C# language designers, and I work on the
             | compiler implementation as well.
             | 
             | C# has never had a "simple tokenizer". Indeed, even the
             | first language has complex lexical constructs that are part
             | and parcel of the language. For example, our _comments_ can
             | store structured data in them (like xml).
             | 
             | > A simple loop that would have worked with the 70s era of
             | programming languages
             | 
             | Yes. But 70s era compilers had to deal with things like not
             | having enough memory to even store basic amounts of data.
             | It also had to work in spaces where things like a 'stack'
             | was just not tenable. We're literally 50 years from that
             | point, and having a compiler do stuff like keeping a stack
             | is not an issue anymore :)
        
         | Metasyntactic wrote:
         | >Now we're adding {{ and }} into the equation. Yay.
         | 
         | Hi, i'm the lang designer and feature implementor here :)
         | 
         | The complexity of lexing/parsing did not get worse here. We
         | actually just lex/parse this stuff the same way that
         | interpolated strings have always been lexed/parsed. This has
         | been supported in the language for almost 10 years at this
         | point :)
        
       | intrasight wrote:
       | I did not understand the xml indentation examples
        
         | Metasyntactic wrote:
         | Hi there! I'm the lang designer and i wrote up that spec. Could
         | you clarify what you didn't understand about the indentation
         | examples? I can work on clarifying them. Thanks!
        
           | intrasight wrote:
           | You say "If the indentation behavior is not desired, it is
           | also trivial to disable like so:"
           | 
           | And the code sample you show differs only in that closing
           | quote isn't indented. You don't explain why and how that
           | change would affect the generated string.
        
             | jodrellblank wrote:
             | It says " _these string literals will naturally remove the
             | indentation specified on the last line when producing the
             | final literal value._ "
             | 
             | Each line in the literal will have leading whitespace
             | trimmed off, up to where the closing quotes are.
             | 
             | (What happens if the closing quotes pass some of the text?)
        
               | Metasyntactic wrote:
               | > (What happens if the closing quotes pass some of the
               | text?)
               | 
               | That's an error. Called out here: https://github.com/dotn
               | et/csharplang/blob/main/proposals/raw...
        
             | Metasyntactic wrote:
             | >And the code sample you show differs only in that closing
             | quote isn't indented. You don't explain why and how that
             | change would affect the generated string.
             | 
             | Hi there. This is explained in the spec in a few places. In
             | the examples section it explicitly states:
             | 
             | > To make the text easy to read and allow for indentation
             | that developers like in code, these string literals will
             | naturally remove the indentation specified on the last line
             | when producing the final literal value.
             | 
             | > If the indentation behavior is not desired, it is also
             | trivial to disable like so:
             | 
             | I thought that was clear as the prior explanation says that
             | we remove the indentation from teh last line. And then i
             | show how you can disable it. Specifically, as you noted
             | because the closing quote line is no longer indented.
             | Cheers!
        
       | gwbas1c wrote:
       | ... Why?
       | 
       | There are so many different ways to do strings in C#. Adding
       | features like this just makes the language harder to learn, and
       | the compiler harder to implement.
       | 
       | At this point, it's probably better to adjust the compiler to
       | make it easier to turn a text file into a hardcoded string. The
       | embedded resource approach works, but it could be significantly
       | smoother.
       | 
       | Or, maybe the compiler needs some form of a plugin architecture
       | so people who want obscure features can figure out how to add
       | them?
        
         | Metasyntactic wrote:
         | Hi. I'm the lang designer and feature implementor.
         | 
         | To your question of "why?", we tried to cover the reasoning in
         | teh proposal. But, the core reason is that today people do use
         | strings a ton. And in many cases it's unpleasant to do so
         | because you always end up with reasons that you need to escape
         | the content. This escaping serves to satisfy the compiler, but
         | really doesn't buy value to teh user the majority of the time.
         | The idea here is that you can just use a raw-string and say:
         | here's the content, exactly as i want it.
        
         | eterm wrote:
         | It talks about why, because all those different ways require
         | escaping.
        
         | radicalbyte wrote:
         | Because it allows us to literally embed other languages within
         | C# and provide full refactoring, tooling support. Next level
         | stuff this (and something which should be normal, it's 2022
         | ffs!).
        
       ___________________________________________________________________
       (page generated 2022-02-17 23:01 UTC)