[HN Gopher] World's Smallest CSV Parser (C#)
___________________________________________________________________
World's Smallest CSV Parser (C#)
Author : vilark
Score : 19 points
Date : 2024-04-10 21:21 UTC (1 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| cb321 wrote:
| Don't just parse - convert. In a pipeline to split-parseable
| data, if you like, such as the possibly smaller, faster, and more
| general: https://github.com/c-blake/nio/blob/main/utils/c2tsv.nim
| (And, ideally, convert all the way to a mmap & go binary format
| like nio so you don't have to re-parse.)
| DanielBryars wrote:
| What's the utility of defining the "Error" exception. Why not use
| an existing one, say InvalidOperationException, or a plain
| Exception. Is making your own better practice?
| gnabgib wrote:
| There _is no_ utility. It 's perhaps written for JavaScript
| developers who are used to Error.. but it's not idiomatic C#.
| Might be indicative of a copilot too.
|
| The use of a class-scoped `StringBuilder` that only one method
| uses, and `ReadQuotedColumn`/`ReadNonQuotedColumn` yielding one
| character at a time, rather than accepting a the builder isn't
| a good sign either (for efficiency). Or casting everything to a
| `char` (this won't support UTF8), or assuming an end quote
| followed by anything (:71) is valid way to end a field.
| neonsunset wrote:
| C# `char` is a UTF-16 code unit. It does not indicate a byte
| which is just `byte`.
|
| Having StringBuilder be a private field on the parser
| instance is not an issue either - it is simply reused.
| lolpanda wrote:
| I reviewed the code in this project and it looks pretty
| reasonable. I thought CSV was a loosely specified format. In my
| past experience, I never had a smooth experience moving data from
| one system to another using CSV. I had a lot of trouble with
| Snowflake -> CSV -> Clickhouse. I now use JSONL for pretty much
| everything.
| buybackoff wrote:
| There are the specs and the real world. The specs are more
| often than not on the opposite side of the moon, you never see
| that in real life. Oh, so many hours of my life were wasted on
| that. Real-world CSVs are as loosely specified as any free text
| in a notepad.
| neonsunset wrote:
| To counterbalance this, an extremely fast CSV parser (also
| written in C#, uses SIMD and multi-threading):
| https://github.com/nietras/Sep/
|
| p.s.: this one is unfortunately another parser that uses drain
| char-by-char into a list/vec parsing approach which is very
| inefficient and plagues many languages, which causes it to not
| take advantage of vectorized string.Split. But other than that,
| I'm happy more people are noticing .NET.
___________________________________________________________________
(page generated 2024-04-10 23:00 UTC)