[HN Gopher] Show HN: Tidy Viewer - a cross-platform CSV pretty p...
___________________________________________________________________
Show HN: Tidy Viewer - a cross-platform CSV pretty printer for
viewer enjoyment
Author : flusteredBias
Score : 314 points
Date : 2021-09-27 13:15 UTC (9 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| wly_cdgr wrote:
| I thought this article was about a new way of understanding
| actual televisions
| brap wrote:
| Same. Read this as some profound observation on what TVs
| actually are. Even thought to myself "huh, I guess OP does has
| a point..."
| c3534l wrote:
| What is CSV other than a matrix of data? Its matrices all the
| way down.
| BrianOnHN wrote:
| > "huh, I guess OP does has a point..."
|
| LMAO, me too!
|
| Guess I've been spending too much time thinking.
|
| Edit: this reminds me of Jimmy Kimmel's segment where they
| "bleep and blur whether they need it or not" so that innocent
| TV clips appear to have profanity/innuendo etc.
| flusteredBias wrote:
| LMAO. I wish.
| sigg3 wrote:
| This is kinda cool, but why not just use column?
|
| The video of catting into tv is equivalent to:
| column -s, -t FILE
| mixmastamyk wrote:
| It shows why in the readme, the benefits are small but real.
| Others have recommended improvements.
| timwis wrote:
| csvkit's csvlook works similarly but outputs a markdown table.
| mejari wrote:
| Does it really need libc6 >= 2.31? Having problems installing it
| on Ubuntu 16.04 LTS because of that dependency.
| flusteredBias wrote:
| Open an issue. I would have to look into that.
| flusteredBias wrote:
| I spend a lot of time in the terminal and want to quickly glance
| at a csv files without making a new script, opening excel, or
| using a tui. I made tidy-viewer (tv) because current tools like
| cat and column were not pretty enough.
|
| tv modifies raw files in the following ways:
|
| 1. NA detection and highlighting 2. Printing only significant
| digits 3. Header and footer meta data
|
| I have been using this a lot at work. There is a lot more work to
| do, but it is in a usable state.
|
| Give it a try! If you like it then star on Github!
| mileza wrote:
| I think it's a great first effort, but there are a number of
| possible improvements to do. The most obvious one would be to
| support passing the file as an argument instead of using cat or
| the redirection operator every time. It's great that it works
| with stdin to allow piping into it, but it's cumbersome if you
| just want to take a file and print it, which will no doubt be a
| common use case.
| MobiusHorizons wrote:
| Do you think 'tv <file.csv' does what you want well enough?
| What is the behavior when you run 'tv file.csv' does it just
| block waiting for input?
|
| I think it's great for a visualizer like this to encourage
| people to get used to the power of shell pipelines if
| possible.
| Someone wrote:
| I think it's a bad idea to go against half a century of
| conventions without good reason.
|
| It's very surprising to see a tool that only works as a
| filter, and doesn't take file paths as arguments.
| GuB-42 wrote:
| It works, but almost all UNIX commands that work on
| pipelines can take a list of files as arguments. Out of the
| commands I use regularly, "patch" is the only one that
| works exclusively from stdin, probably because file
| arguments have a different, somewhat obscure, and probably
| historical meaning.
|
| If appropriate, using files as arguments instead of using
| shell pipelines is a best practice. Commands can optimize
| for that use case, print better error messages, etc...
|
| And it is _not_ a good thing to encourage useless use of
| cat. If you goal is to show how your tool is to be used
| with pipelines, show an actually useful pipeline for
| example "sed '1b;/abc/!d' file.csv | tv". The "sed"
| command prints the first line (header), and all lines
| containing "abc".
| einpoklum wrote:
| First of all - kudos on tackling this task - it is indeed very
| annoying to get CSVs to render nicely on a terminal.
|
| 1. How does tidy-viewer compare with csvlook?
|
| 2. Looking at the demo video, there seems to be an odd fixation
| with "N/A". The CSV spec, AFAIK, doesn't recognize this phrase.
| I don't understand why someone would expect a quoted string
| field whose raw characters are "n/a" should be rendered as
| anything other than n/a (i.e. lowercase and without the
| quotes). I'm guessing maybe in your workflow you want to use
| that phrase a lot, but for a tool for the general public I'd
| not do this kind of interpretation; and I would leave an empty
| field as empty.
|
| 3. tidy-viewer seems to require "unstable library features", or
| at least ones which were unstable as of Rust 1.48.0 . It would
| be nice if you could be compatible with older rust
| distributions/versions.
|
| 4. Many systems, especially older ones, especially ones which
| you access remotely and don't have root privileges on, won't
| have a rust installation. It would be even more convenient if
| you could provide binaries with little or no extra dynamic
| library dependencies, which could be used on older / rustless
| systems. I realize this is a tall order, however.
|
| 5. What about scrolling? The worst part of viewing CSVs is
| having to handle wide ones which exceed the terminal width, and
| having decent horizontal as well as vertical scrolling ability.
| less doesn't cut it, because it doesn't keep the header row,
| plus it doesn't recognize field widths.
|
| 6. tidy-viewer does not seem to support wrapping longer fields
| onto multiple terminal lines.
|
| 7. When the user doesn't specify the color scheme, are you
| choosing one based on the terminal colors, or are you using
| absolute color values? I suggest the former.
|
| 8. tidy-viewer loads and parses the entire CSV immediately;
| and, in fact, seems to keep two copies of it in memory at once.
| This means it cannot be used with large files without
| thrashing; and even if your CSV does fit in global memory, it
| will still be kind of unusable, trying to dump gigabytes onto
| the terminal.
|
| Bottom line: A nice initial effort, but the more serious
| challenges are yet to be tackled, plus needs to be more
| robustly cross-platform.
| flusteredBias wrote:
| First of all - kudos on tackling this task - it is indeed
| very annoying to get CSVs to render nicely on a terminal.
|
| > How does tidy-viewer compare with csvlook?
|
| The most important issue to me is that csvlook is a much less
| pleasant viewing experience, but there is also this
| ...csvlook reads and parses all of the data. Try pushing
| diamonds.csv to csvlook. When I do it on my machine it takes
| 15.228 seconds while tv takes 0.0042 seconds. For this reason
| tv is much faster, but speed is not the goal of the package.
| tv's purpose is to maximize viewer enjoyment.
|
| 2. Looking at the demo video, there seems to be an odd
| fixation with "N/A". The CSV spec, AFAIK, doesn't recognize
| this phrase. I don't understand why someone would expect a
| quoted string field whose raw characters are "n/a" should be
| rendered as anything other than n/a (i.e. lowercase and
| without the quotes). I'm guessing maybe in your workflow you
| want to use that phrase a lot, but for a tool for the general
| public I'd not do this kind of interpretation; and I would
| leave an empty field as empty.
|
| I could not say it better than this:
|
| > The norm of treating missing data as NA exists in R (which
| the developer of this is clearly inspired by based on the
| GitHub readme.). Pandas in Python is stuck with NaN for
| numeric types (not quite correct) and "" or None for string
| types. Personally I like the choice to both explicitly render
| missing data in colour and to apply NA as a placeholder text
| to display that colour.
|
| 3. tidy-viewer seems to require "unstable library features",
| or at least ones which were unstable as of Rust 1.48.0 . It
| would be nice if you could be compatible with older rust
| distributions/versions.
|
| That is a good point. I also release binaries which I think
| makes this requirement less needed. What are your thoughts.
|
| 4. Many systems, especially older ones, especially ones which
| you access remotely and don't have root privileges on, won't
| have a rust installation. It would be even more convenient if
| you could provide binaries with little or no extra dynamic
| library dependencies, which could be used on older / rustless
| systems. I realize this is a tall order, however.
|
| With github actions I auto-build binaries for many OSes. See
| https://github.com/alexhallam/tv/releases/tag/0.0.13
|
| 5. What about scrolling? The worst part of viewing CSVs is
| having to handle wide ones which exceed the terminal width,
| and having decent horizontal as well as vertical scrolling
| ability. less doesn't cut it, because it doesn't keep the
| header row, plus it doesn't recognize field widths.
|
| Scrolling is nice. To offer scrolling the only option I am
| aware of is turning this _cli_ into a _tui_. I made the
| choice early on to stay chose the more minimal path and stick
| to a _cli_. The goal is to be a `column` replacement not a
| spreadsheet replacement.
|
| 6. tidy-viewer does not seem to support wrapping longer
| fields onto multiple terminal lines.
|
| The goal is to glance at the data as a whole not a cell or
| fields. If there are cells with long text they get cut at 20
| characters. I like this a lot. I would prefer to know that
| there is a lot of text that I can dig into latter, but when I
| am glancing at the csv I just want an overall picture. In my
| view tables of data are data visualizations meaning that I
| don't have to show everything to understand enough of it.
|
| 7. When the user doesn't specify the color scheme, are you
| choosing one based on the terminal colors, or are you using
| absolute color values? I suggest the former.
|
| Great question. I want to eventually add the ability for
| users to make a config file will their own colors. At this
| time I just have absolute presets. If you are interested I
| would happily take a contribution that allows users the
| option to configure tv with some dotfile.
|
| 8. tidy-viewer loads and parses the entire CSV immediately;
| and, in fact, seems to keep two copies of it in memory at
| once. This means it cannot be used with large files without
| thrashing; and even if your CSV does fit in global memory, it
| will still be kind of unusable, trying to dump gigabytes onto
| the terminal.
|
| That is almost true. tidy-viewer reads the entire csv, but
| only parses the head. If I knew of a way to get the number of
| rows and columns of a csv without reading the whole file then
| I would. I know there is a good deal more room for memory
| optimization. This is not my strength and I am still
| learning.
|
| 9. Bottom line: A nice initial effort, but the more serious
| challenges are yet to be tackled, plus needs to be more
| robustly cross-platform.
|
| Thanks for the compliment. It is still a work in progress.
| notafraudster wrote:
| The norm of treating missing data as NA exists in R (which
| the developer of this is clearly inspired by based on the
| GitHub readme.). Pandas in Python is stuck with NaN for
| numeric types (not quite correct) and "" or None for string
| types. Personally I like the choice to both explicitly render
| missing data in colour and to apply NA as a placeholder text
| to display that colour.
| sneak wrote:
| Your asciinema playback made me twitch. Is the lack of a
| trailing space in your PS1 intentional?
| IgorPartola wrote:
| This looks great! I wonder how long it'll be until someone
| posts a long ask snippet that will do something similar and
| claim this isn't progress, but rest assured that they are
| wrong. I'm adding tv to my toolbox.
| flusteredBias wrote:
| Thanks! I appreciate the compliment!
| contravariant wrote:
| The NA detection and higlighting is nice but I'm not sure how I
| feel about showing anything other than the exact textual value.
| I don't mind abridging quotes when they're not necessary, but
| showing "N/A", NA,, etc. as the same value is a bit iffy.
| psadauskas wrote:
| When presented with a similar problem, I tend to use non-
| ascii characters. For example, in my `~/.psqlrc` I have:
| \pset null
|
| Looks like this in output: 40 | |
| 2021-09-23 20:42:32.536571 | 41 | 15 |
| 2021-09-23 20:42:33.177474 | 42 | 19 |
| 2021-09-23 20:42:33.212133 | 43 | |
| 2021-09-23 20:42:33.247346 |
| a1369209993 wrote:
| I was going compain about null bytes in text (never,
| period), but then realized you actually did mean the U+2400
| SYMBOL FOR NULL[0] character itself. That's surprisingly
| viable (though you do now have to worry about the string
| "\xE2\x90\x80" ending up in your data).
|
| 0: Which is actually incorrectly named - it should be
| "SYMBOL FOR NUL".
| flusteredBias wrote:
| It is rough. There are many ways that different tools put
| NAs, na, N/A, "", etc. in a file. To chose only "NA" would
| mean I would be excluding the output of other tools. I chose
| accessibility over specificity. #trade-offs.
| ayoubElk wrote:
| Maybe you could just background highlight the empty cells?
| flusteredBias wrote:
| Well, I think being promiscuous with "NA", "N/A", nan,
| etc. is a separate issue from a blank cell. A blank cell
| is literally missing. That should be filled with NA.
| nicoburns wrote:
| > A blank cell is literally missing. That should be
| filled with NA.
|
| Why? "NA" stands for "Not Applicable", but a blank cell
| in a CSV could represent any number of things of which
| "Not Applicable" is only one.
| flusteredBias wrote:
| haha. You are right "NA" stands for "Not Applicable".
| That is not always how people/programs using it though.
| What are some alternatives that you would suggest? I am
| happy to learn.
| nicoburns wrote:
| I would suggest similar to what other people have
| suggested where you color the background of the cell red
| and then just display the literal content of the cell. I
| think it would be reasonable to have this configurable
| via command line arguments though, so if you like the
| "NA" that could also be a mode.
|
| Perhaps it would make sense to have a "pretty" mode and a
| "literal" mode (which would also turn off the clever
| processing of numbers)?
| eevilspock wrote:
| Simple: provide CLI switches to let the user decide what
| they want for NA detection (current behavior as default,
| user can provide alternate NA values, per the source file
| or the natural language it is expressed in), and how they
| want them displayed, whether as-is, blank or a consistent
| custom value (as-is should be the default).
| contravariant wrote:
| Fair enough but that doesn't explain why you chose to
| display all of them as "NA". As you say there are lots of
| different ones, hence it would be a bad idea to pick one as
| the 'default' to display. To me it's important whether
| something is missing, filled with "N/A", or "null", or "Not
| Applicable" etc.
| asicsp wrote:
| Hey, if you are able to edit the title of this submission, can
| you add `Show HN: ` prefix?
|
| See https://news.ycombinator.com/showhn.html for details.
| flusteredBias wrote:
| Blast, I am sorry. It looks like I can't edit the title
| anymore. Otherwise I would make the change.
| moritonal wrote:
| Man.. TV is such a good name for a visualiser tool that it'd be
| excellent if it could pretty-print any content given to it.
|
| Does your framework support the idea of pre-parsing the file's
| content and selecting an appropriate renderer, or is it fairly
| tied to CSVs?
| inostia wrote:
| I actually like `tidy` better and honestly would rather have
| `cat a.csv | tidy ...`. But it's probably already a thing.
| hnlmorg wrote:
| Nice idea. I might actually work on this myself
| flusteredBias wrote:
| I would need more details on what you mean by "pre-parsing".
| It works with any deliminator, it could be comma-separated,
| pipe-separated, etc.
| mixmastamyk wrote:
| Doesn't need to be pre-parsed. Perhaps give the filename to
| the utility instead of content via stdin. Then filename
| gives a hint. If there is none, run "file filename" (via
| library) beforehand.
| berlinquin wrote:
| Cool project! I'm familiar with column, and this looks like a
| good replacement.
|
| Curious, how do you handle formatting on cells with long
| strings that need to overflow to multiple lines? As soon as you
| try to optimize the column widths for table length, you start
| hitting an NP-hard problem.
|
| https://quintenkent.com/content/column-problem.html
| flusteredBias wrote:
| I actually read that article when I started making the
| package. You can see some of the input data here
| https://github.com/alexhallam/tv/blob/main/data/a.csv. I let
| the user chose how long the max column width should me then
| append "...". The default value is 20 characters.
| TacticalCoder wrote:
| It's very weird for a project made to "maximize viewer enjoyment"
| to not put a space after the prompt. The one saved character on
| the line is definitely not worth the illegible resulting line:
| this doesn't maximize my enjoyment at all when viewing the
| examples.
| drcongo wrote:
| Some might call this comment petty, but that was making my
| brain itch too.
| stavros wrote:
| That's just the shell prompt on the recording machine. You can
| use your own prompt in your shell.
| hnlmorg wrote:
| That's a really uncharitable comment considering it the
| developers prompt and has nothing to do with `tv` aside from
| appearing in the asciinema demo.
| dotancohen wrote:
| Very nice! How does it handle CSVs that are wider or longer than
| the terminal? How does it deal with columns that are
| exceptionally long, or multiline?
|
| Often when working with large CSV files, I'll need to show or
| hide specific columns, especially if they are very long. Also,
| grepping the output for a specific line will hide the header as
| well, not to mention make the output unnecessarily wide if non-
| matching lines have longer fields than do the matching lines. So
| a built-in grepping feature would make this very useful.
| flusteredBias wrote:
| > How does it handle CSVs that are wider or longer than the
| terminal?
|
| Columns that are exceptionally long but cutting and appending
| an ellipsis if the line is over 20 characters.
|
| > a built-in grepping feature would make this very useful.
|
| see the following for csv data manipulation:
|
| xsv - Command line csv data manipulation. Rust
|
| csvtk - Command line csv data manipulation. Go
|
| tsv-utils - Command line csv data manipulation toolkit. D
|
| q - Command line csv data manipulation query-like. Python
|
| miller - Command line data manipulation, statistics, and more.
| C
| dotancohen wrote:
| Terrific, thank you!
| rlue wrote:
| Why not visidata?
|
| https://www.visidata.org/
| https://www.youtube.com/watch?v=N1CBDTgGtOU
|
| (It does much, much more than pretty printing, but no reason you
| can't use it for that.)
| qwertyuiop_ wrote:
| This is why I love HN. Never knew this existed. It has become
| my favorite tool in the past 5 mins I installed it. Also
| reminds me of Mainframe programs that I encountered in the
| past. I wish we had more tools like this instead of electron
| mouse click based apps for people who prefer speed and
| keyboard.
| flusteredBias wrote:
| I love visidata! But when I want to just glance at a csv file I
| reach for tv (I used to use `column` which is more of a tv
| competitor than visidata). This is for a couple reasons.
|
| 1. tv gives a quick summary of the count of rows and columns
|
| 2. tv shows all columns at the bottom that don't fit in the
| terminal. With vd I have to scroll on wide data.
|
| 3. tv guides the eye to missing data better with NA highlights
|
| 4. tv has sigfig logic that is better. I work with files where
| the decimal dust can become long. Those unnecessary characters
| pushes remaining columns off the screen. This means the user
| would need scroll over to see additional columns. I generally
| think it is better to avoid additional key presses if possible.
|
| 5. tv is fast for large files. It does not have to read and
| format all of the data like vd. tv is focused more on _looking_
| at the file and not _operating_ on file. It does not have to do
| as much as vd. That helps tv with what it is uniquely good at.
| "Do one thing and do it well"
|
| It does not matter if your file is really wide (lots of
| columns) or really long -- tv will give the user a compact
| useful pretty print of the data. Why not use vd as a TUI
| spreadsheet and tv for glancing at csv files. They are both
| great tools in my eyes with different purposes.
| saulpw wrote:
| Hey there, VisiData author here. Nice work with tv! I'm sure
| it's more useful than VisiData for certain use cases. I just
| want to clear some things up since there are a few
| misconceptions here (which will happen if you don't use
| VisiData a lot):
|
| 1. In VisiData, The number of rows is always shown in the
| lower right, and you can see the number of columns with
| either Ctrl+G or a list of the columns with Shift+C. Or
| Shift+I for the list of columns with summary statistics
| (mode/distinct/errors/etc). This is an extra keystroke, but
| the amount of data you can get with that keystroke more than
| justifies it.
|
| 5. VisiData will instantly open and show any file it can, and
| continue to load the rest until it's done or you press Ctrl+C
| (or quit). Everything in VisiData is lazily evaluated, so
| it's not actually doing any more work than tv when you view
| the first page of rows, and then you can see the next few
| pages of rows with only one keystroke (PgDn, as opposed to
| having to edit a command and rerun it). Fewer keypresses ftw!
|
| A lot of people think VisiData is a TUI spreadsheet, but vd
| is not a "spreadsheet" in the classic sense, as it's not
| cell-based. Its primary use-case is exploring and wrangling
| tabular data. It just turns out that this is what a lot of
| people are doing with their spreadsheets, but they have to
| bend over backwards to get Excel/whatever to play nice with
| their data's structure. By the same token, if you try to do
| little single-cell formulas in VisiData, it's going to be
| quite difficult.
|
| For people who like static binaries and only need to view a
| few rows of CSV files, or produce part of a larger report in
| a pipeline, tv could be a better fit than VisiData,
| especially if it continues to be maintained. I'm always
| excited to see new data tools in the terminal space!
| certifiedloud wrote:
| Visidata is a great interactive tool. TV seems like it would be
| better when scripting, or in one-liners.
| dotancohen wrote:
| For scripting I would use grep and cut, maybe awk. For
| scripting with CSV files, at least in my experience, you
| usually want specific columns from specific lines.
|
| If TV had a switch for specifying only certain columns, that
| would make the job much easier.
| flusteredBias wrote:
| sounds like you are looking for xsv. I like that tool a lot
| for selecting specific columns.
| semireg wrote:
| Consider adding a code snippet showing a.csv output. I had to
| watch a video just to see text.
| flusteredBias wrote:
| I have some work to do on the README. I will show the output
| better. The difficulty with showing the output only is that it
| does not capture the coloring. Maybe I will show the output, or
| add a picture, or have an animated gif. Maybe all three.
| GrayShade wrote:
| Not OP, but there's a screenshot under
| https://github.com/alexhallam/tv#1-na-comprehension.
|
| Which is actually worse for screen reader users, I suppose.
| Someone wrote:
| And a list of options. Reading the first 80 lines of
| https://github.com/alexhallam/tv/blob/main/src/main.rs was in
| some sense more educational than the readme.
|
| It, for example, allowed me to make an educated guess as to the
| answer to the question "how does this handle huge files?". It
| by default only reads 25 lines.
|
| (That makes the example from the header: cat
| diamonds.csv | head -n 35 | tv
|
| a bad example. You shouldn't need that _head_ in-between.
|
| However, line 167 says
| //.take(row_display_option + 1)
|
| That seems to indicate this reads the entire file into memory,
| and that guess wasn't that educated at all.
| 6502nerdface wrote:
| A while ago, Two Sigma Investments open-sourced its own curses-
| based internal tool for pretty printing tabular data:
| https://github.com/twosigma/ngrid
| stavros wrote:
| This is quite nice, but I don't like how it cuts off the output
| (instead of making it scrollable). Also, why require the use of
| `cat`? Accepting a filename so I can do `tv foo.csv` would be
| much more ergonomic, in my opinion.
| bityard wrote:
| Why waste code teaching the program how to open files correctly
| when the shell already knows?
|
| tv < foo.csv
| stavros wrote:
| Because I'd rather the program did the work and let me skip
| the extra keystroke.
| GrayShade wrote:
| Cool project (and written in my favourite language), but I really
| hate how there's no space at the end of your prompt.
| stavros wrote:
| I have to agree, I felt dirty watching the recording. The
| project looks great, though.
| BiteCode_dev wrote:
| It feels like when somebody opens a ( but doesn't close it.
| hoosieree wrote:
| )
| BiteCode_dev wrote:
| There is such a big latency in HN for self closing
| parenthesis.
| flusteredBias wrote:
| Love the idea! Already opened the issue and will merge a PR
| before the next release.
| [deleted]
| GrayShade wrote:
| Ah, I didn't mean a newline at the end of the output. Just a
| space after your shell prompt ($), see e.g.
| `user@~/code/data$cat` in https://github.com/alexhallam/tv/bl
| ob/main/img/column_v_tv2.....
| flusteredBias wrote:
| lol. Got it.
| lambic wrote:
| And the use of cat on a single file..
| netcraft wrote:
| I don't understand your comment - could you expand on it?
| [deleted]
| ComputerGuru wrote:
| cat just regurgitates the contents of the file, but the
| resulting piped fd is non-seekable. Since almost every
| command that can operate on a file from stdin can also
| operate on the file by name/path, at best this is just a
| needless invocation of a process (i.e. `tv foo.csv` should
| have been used instead of `cat foo.csv | tv`) - if the app
| in question can't handle paths, then you can have the shell
| pipe the file into it instead (e.g. `tv < foo.csv`). At
| worst, the recipient program would need to buffer the
| entire contents of the input if it needs to perform non-
| sequential operations on the source data - this is the case
| with things like `tac` that need to seek to the end of the
| input (see https://github.com/neosmart/tac for how `cat foo
| | tac` requires buffering but both `tac foo` and even `tac
| < foo` don't).
| netcraft wrote:
| thank you! I knew some of that but learned a lot too.
| mthoms wrote:
| Google the phrase "useless use of cat".
|
| To some, it's a faux-pas. Personally, I like the
| aesthetics of cat for my own scripts. It follows the
| "pipe flowing" idiom better.
|
| There are performance reasons why "useless cat" should be
| avoided though. So avoid it where performance is
| important (or when some other hardcore CLI jockey is
| going to see your code :))
| hibbelig wrote:
| I understand that people say you can replace
| foo < bar
|
| with <bar foo
|
| but I've never been able to get myself to do this.
| davidatbu wrote:
| What is the value-add when compared to using `xsv` to pretty
| print? Is it only the fact that it highlights NA values?
| flusteredBias wrote:
| xsv is one of my favorite data manipulation tools. Also, the
| author of that package is one of the best developers I know. I
| use xsv _with_ tv. I normally pipe the output of xsv to tv.
|
| 1. As you noted NA comprehension 2. Column overflow logic for
| different sized terminals 3. Summary meta data in the header 4.
| Significant digits logic. This allows users to view more
| columns than they would otherwise view due to decimal dust
| shifting the columns over. 5. This is the most import! It looks
| really pretty!
| thamer wrote:
| Some (most?) tools that output data in columns and fit each one
| to the largest value in that column need to scan the whole file
| as a first pass just to start displaying data.
|
| Not only is it the case with this tool, but from what I'm reading
| in main.rs it looks like it's also loading _the whole file_ in
| memory. I was going to say that scanning the file was a deal-
| breaker, but if true this is much more resource-intensive.
|
| This looks like a nice tool, but these design choices seem to
| limit its use to relatively small files. It could be updated to
| have a read-ahead buffer instead and adjust its output as new
| lines are discovered with values of different width, although
| doing this without a jarring resize could be challenging.
|
| Could someone with better knowledge of Rust than mine confirm
| this?
|
| I see the full dataset being loaded here[1] and the column widths
| being computed here.[2]
|
| [1]
| https://github.com/alexhallam/tv/blob/main/src/main.rs#L183-...
|
| [2]
| https://github.com/alexhallam/tv/blob/main/src/main.rs#L218-...
| flusteredBias wrote:
| > these design choices seem to limit its use to relatively
| small files
|
| 1. As a rule-of-thumb, I have been working on functionality
| before optimization. That said, `tv` is really fast. It is
| completely false that `tv` only works for relatively small
| files. I just pushed a 624MB file to `tv`. It ran in 2.8
| seconds. With `column` it takes 5.0 seconds. Now, I would love
| help from programmers smarter than me. I am sure there are a
| lot of optimization gains to be had in `tv`. I just wanted to
| make sure potential users are not misled. `tv` is performant.
|
| > Some (most?) tools that output data in columns and fit each
| one to the largest value in that column need to scan the whole
| file as a first pass just to start displaying data.
|
| > Not only is it the case with this tool, but from what I'm
| reading in main.rs it looks like it's also loading the whole
| file in memory.
|
| 2. `tv` reads once, but parse partly. This means that it reads
| the full file only to grab the number of rows. It only
| parses(take) the first n rows.
|
| https://github.com/alexhallam/tv/blob/b548f0d19f64438d53f732...
|
| https://github.com/alexhallam/tv/blob/b548f0d19f64438d53f732...
| TAForObvReasons wrote:
| If the goal is to calculate the correct column width, you
| have to do one pass through the data before writing the first
| row.
|
| If the file can be read multiple times (not a UNIX stream),
| you can just read the file twice.
|
| If the file is a stream, instead of retaining the entire
| dataset in memory, you can write to a temporary file and re-
| parse it after calculating the widths.
| flusteredBias wrote:
| The correct column width is calculated from the first n
| rows not the full file.
|
| A stream does not work for tv because a stream does not
| know how many rows are in the file a priori. Displaying the
| dimensions of the file is a priority for `tv`. I am very
| happy with that trade-off. I would rather know the
| dimensions of a file than have a file stream of unknown
| dimensions.
| adamdusty wrote:
| If you did it the way he's talking about you would stream
| through the file to find how many rows and write the file
| as a temp file that you could re-parse for the actual
| data.
|
| I'm not saying you should or shouldn't, but your use case
| doesn't bar you from using streams.
| flusteredBias wrote:
| I see. Thanks for the clarification.
| eevilspock wrote:
| I like this idea. I don't think it would be jarring if the
| read-ahead buffer was a minimal number of lines, i.e. looking
| like distinct pages. The default could be at least the line
| height of the terminal, or some multiple.
|
| There could be an option to redisplay the header row for
| resized "pages".
|
| There could be a CLI switch giving the user control, i.e. make
| everyone happy.
| killjoywashere wrote:
| I think data scientists will recognize this problem, and
| there's a well-used solution: .head()
|
| Just show me the top 5 rows. That's all most people are
| looking for.
|
| cat data/a.csv | tv --head
| franga2000 wrote:
| > Just show me the top 5 rows. That's all most people are
| looking for.
|
| Is it? I'd wager that can't be more than half its use at
| most. Accessing a specific section that could be at any
| section of the file is very common in my experience, as is
| truly random access. Both of these, as well as the first
| few rows use case, are far better served by a page system.
| fiddlerwoaroof wrote:
| Or: head -n5 data/a.csv | tv
| eli wrote:
| Unless your csv has embedded line breaks
| flusteredBias wrote:
| Can you give an example of what you mean? If it breaks tv
| then I would like to add it to the automated tests and
| see if we can work on it.
| eli wrote:
| No sorry, I assume tv is fine. The problem is in assuming
| `head -n5` gives you 5 rows and piping that into tv.
| flusteredBias wrote:
| Oo I see. Thanks for clarifying.
| iso1210 wrote:
| Demo looks great, alas the prebuilt binaries don't work
| $ wget
| https://github.com/alexhallam/tv/releases/download/0.0.10/tidy-
| viewer --2021-09-27 16:47:43--
| https://github.com/alexhallam/tv/releases/download/0.0.10/tidy-
| viewer Resolving github.com (github.com)... 140.82.121.3
| Connecting to github.com (github.com)|140.82.121.3|:443...
| connected. HTTP request sent, awaiting response... 404 Not
| Found 2021-09-27 16:47:44 ERROR 404: Not Found.
|
| The deb in https://github.com/alexhallam/tv/target doesn't exist
| either
| flusteredBias wrote:
| 1. Looks like you are using 0.0.10 the current version is
| 0.0.13
|
| 2. I need to update the README the binaries are here
| https://github.com/alexhallam/tv/releases/tag/0.0.13
|
| I recently moved away from manually building binaries to
| automated building for many architectures. I am still learning
| how to use github actions to build for a matrix of
| architectures. I am still learning.
| leptoniscool wrote:
| CSV is still a major part of the development ecosystem, amazing
| that it has such staying power after all these years.
| treve wrote:
| I've made a Lotus 1-2-3 inspired CSV viewer for the terminal too.
| Had big plans for it, but it's just a basic viewer now:
|
| https://github.com/evert/csv123
| flusteredBias wrote:
| Keep at it. I think it looks great!
| oaiey wrote:
| There should be a new package next to the traditional gnu tools
| containing the modern needed tools e.g. jq, curl or tv. Sometimes
| i really miss the extended sw package on some machines?
| flusteredBias wrote:
| Sounds like a good "awesome" page on github.
| gurgeous wrote:
| Hey, great start. I spend half my day in CSVs and I am definitely
| your target audience. Most of the time I use bat, visidata or
| tabview. In many ways tabview is the best, though recently the
| project has been abandoned.
|
| tv looks excellent. Fun name. I think if you added a couple of
| features it would ascend to my toolbox:
|
| (1) scrolling (horizontal and vertical)
|
| (2) better command line parsing. Running "tv" without stdin or
| arguments should produce an error/help. Running "tv xyz.csv"
| should read that file.
|
| Good luck!
| ComputerGuru wrote:
| XSV [0] can also pretty-print (minus the colors), but that's just
| the tip of the iceberg as far as what it can do. It's very handle
| for quick statistical analysis of CSV input.
|
| [0]: https://github.com/BurntSushi/xsv
| flusteredBias wrote:
| I love xsv! I mention in the readme that command line data
| manipulation tools are great compliments to tv.
|
| https://github.com/alexhallam/tv#tools-to-pair-with-tv
| baggiponte wrote:
| that's exactly the comment I was looking for! xsv is super
| powerful and I think you might both draw inspiration from one
| another. I read above that tv reads everything into memory:
| maybe you can exploit some xsv tricks to avoid that. I feel
| tv looks great to visualise the outcome at the end of a
| pipeline, perhaps with xsv. I am no Ruby expert either, but
| this can become a cool Homebrew binary: people on macOS will
| use it too!
| flusteredBias wrote:
| I will add some Homebrew installation instructions. That is
| now an open issue. I want this tool to be highly
| accessible. Again, xsv is the best. I like the idea of
| small utilities that specialize in a specific task.
|
| From the Unix philosophy:
|
| > Make each program do one thing well. To do a new job,
| build afresh rather than complicate old programs by adding
| new "features".
| frankfrank13 wrote:
| Brings a tear to my eye. Everything else I've used is so heavy-
| handed, I just wanted the jq of CSV's
| tambourine_man wrote:
| This looks great.
|
| How well does it handle the edge cases of CSV, like escaped
| commas, quoted text, escaped quotes, and all that fun, fun stuff?
| flusteredBias wrote:
| If you come across an edge case that tv does not handle then
| let me know. I will add a tests csv file as part of the current
| portfolio of test csvs.
| https://github.com/alexhallam/tv/tree/main/data
| GrayShade wrote:
| Pretty well, I'd guess. It uses a well-tested CSV library.
| flusteredBias wrote:
| I have been using tv now for a couple months at work. It has
| been working well on the data I see. If you find edge cases
| then please open an issue with an example csv.
| claimred wrote:
| Wanted to mention that Windows PowerShell supports pretty CSV
| printing out of the box, like so Import-Csv
| .\Levels.csv | Format-Table Count Level elevation
| Level name Name Object type Unique ID -----
| --------------- ---------- ---- -----------
| --------- 1 -600.0000000 Store -0,600 - Store
| Level dc611fed-1783-d759-053a-b19848c51491 1
| 2850.0000000 Store +2,850 - Store Level
| c59f2ae4-0e94-6ea0-bd82-8306971e628c 1 3350.0000000
| Roof +3,350 - Roof Level
| 7b487ac2-e102-dc23-6ad9-81c39124de1d
| slaymaker1907 wrote:
| PowerShell is actually pretty good at manipulating CSV and
| JSON. However, I would definitely recommend using v7 (i.e.
| pwsh) since it has many improvements over v5 (default on
| Windows). For example, Group-Object seems to be several orders
| of magnitude faster using the latest version.
| sixothree wrote:
| There's also ConsoleGridView.
|
| https://devblogs.microsoft.com/powershell/introducing-consol...
| qorrect wrote:
| Dang looks sick, wonder if I get it on *nix.
| hnlmorg wrote:
| Shameless plug, but so does my shell,
| https://github.com/lmorg/murex $ open
| test/example.csv | format generic Login email
| Identifier One-time password Recovery code First name Last
| name Department Location rachel@example.com 9012
| 12se74 rb9012 Rachel Booker Sales
| Manchester laura@example.com 2070 04ap67
| lg2070 Laura Grey Depot London
| craig@example.com 4081 30no86 cj4081
| Craig Johnson Depot London
| mary@example.com 9346 14ju73 mj9346
| Mary Jenkins Engineering Manchester
| jamie@example.com 5079 09ja61 js5079
| Jamie Smith Engineering Manchester
|
| My shell also aims to have closer compatibility with POSIX
| (albeit it's not a POSIX shell) so you can use all the same
| command line tools you're already familiar with too (which, for
| me at least, was the biggest hurdle in my adoption of
| PowerShell).
|
| It also supports other file types out of the box too. eg
| jsonlines $ open test/example.csv | format
| jsonl ["Login email","Identifier","One-time
| password","Recovery code","First name","Last
| name","Department","Location"] ["rachel@example.com","901
| 2","12se74","rb9012","Rachel","Booker","Sales","Manchester"]
| ["laura@example.com","2070","04ap67","lg2070","Laura","Grey","D
| epot","London"] ["craig@example.com","4081","30no86","cj4
| 081","Craig","Johnson","Depot","London"] ["mary@example.c
| om","9346","14ju73","mj9346","Mary","Jenkins","Engineering","Ma
| nchester"] ["jamie@example.com","5079","09ja61","js5079",
| "Jamie","Smith","Engineering","Manchester"]
___________________________________________________________________
(page generated 2021-09-27 23:00 UTC)