[HN Gopher] World's Smallest CSV Parser (C#)
       ___________________________________________________________________
        
       World's Smallest CSV Parser (C#)
        
       Author : vilark
       Score  : 19 points
       Date   : 2024-04-10 21:21 UTC (1 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | cb321 wrote:
       | Don't just parse - convert. In a pipeline to split-parseable
       | data, if you like, such as the possibly smaller, faster, and more
       | general: https://github.com/c-blake/nio/blob/main/utils/c2tsv.nim
       | (And, ideally, convert all the way to a mmap & go binary format
       | like nio so you don't have to re-parse.)
        
       | DanielBryars wrote:
       | What's the utility of defining the "Error" exception. Why not use
       | an existing one, say InvalidOperationException, or a plain
       | Exception. Is making your own better practice?
        
         | gnabgib wrote:
         | There _is no_ utility. It 's perhaps written for JavaScript
         | developers who are used to Error.. but it's not idiomatic C#.
         | Might be indicative of a copilot too.
         | 
         | The use of a class-scoped `StringBuilder` that only one method
         | uses, and `ReadQuotedColumn`/`ReadNonQuotedColumn` yielding one
         | character at a time, rather than accepting a the builder isn't
         | a good sign either (for efficiency). Or casting everything to a
         | `char` (this won't support UTF8), or assuming an end quote
         | followed by anything (:71) is valid way to end a field.
        
           | neonsunset wrote:
           | C# `char` is a UTF-16 code unit. It does not indicate a byte
           | which is just `byte`.
           | 
           | Having StringBuilder be a private field on the parser
           | instance is not an issue either - it is simply reused.
        
       | lolpanda wrote:
       | I reviewed the code in this project and it looks pretty
       | reasonable. I thought CSV was a loosely specified format. In my
       | past experience, I never had a smooth experience moving data from
       | one system to another using CSV. I had a lot of trouble with
       | Snowflake -> CSV -> Clickhouse. I now use JSONL for pretty much
       | everything.
        
         | buybackoff wrote:
         | There are the specs and the real world. The specs are more
         | often than not on the opposite side of the moon, you never see
         | that in real life. Oh, so many hours of my life were wasted on
         | that. Real-world CSVs are as loosely specified as any free text
         | in a notepad.
        
       | neonsunset wrote:
       | To counterbalance this, an extremely fast CSV parser (also
       | written in C#, uses SIMD and multi-threading):
       | https://github.com/nietras/Sep/
       | 
       | p.s.: this one is unfortunately another parser that uses drain
       | char-by-char into a list/vec parsing approach which is very
       | inefficient and plagues many languages, which causes it to not
       | take advantage of vectorized string.Split. But other than that,
       | I'm happy more people are noticing .NET.
        
       ___________________________________________________________________
       (page generated 2024-04-10 23:00 UTC)