hngopher.com

       [HN Gopher] Show HN: Tidy Viewer - a cross-platform CSV pretty p...
       ___________________________________________________________________
        
       Show HN: Tidy Viewer - a cross-platform CSV pretty printer for
       viewer enjoyment
        
       Author : flusteredBias
       Score  : 314 points
       Date   : 2021-09-27 13:15 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | wly_cdgr wrote:
       | I thought this article was about a new way of understanding
       | actual televisions
        
         | brap wrote:
         | Same. Read this as some profound observation on what TVs
         | actually are. Even thought to myself "huh, I guess OP does has
         | a point..."
        
           | c3534l wrote:
           | What is CSV other than a matrix of data? Its matrices all the
           | way down.
        
           | BrianOnHN wrote:
           | > "huh, I guess OP does has a point..."
           | 
           | LMAO, me too!
           | 
           | Guess I've been spending too much time thinking.
           | 
           | Edit: this reminds me of Jimmy Kimmel's segment where they
           | "bleep and blur whether they need it or not" so that innocent
           | TV clips appear to have profanity/innuendo etc.
        
             | flusteredBias wrote:
             | LMAO. I wish.
        
       | sigg3 wrote:
       | This is kinda cool, but why not just use column?
       | 
       | The video of catting into tv is equivalent to:
       | column -s, -t FILE
        
         | mixmastamyk wrote:
         | It shows why in the readme, the benefits are small but real.
         | Others have recommended improvements.
        
       | timwis wrote:
       | csvkit's csvlook works similarly but outputs a markdown table.
        
       | mejari wrote:
       | Does it really need libc6 >= 2.31? Having problems installing it
       | on Ubuntu 16.04 LTS because of that dependency.
        
         | flusteredBias wrote:
         | Open an issue. I would have to look into that.
        
       | flusteredBias wrote:
       | I spend a lot of time in the terminal and want to quickly glance
       | at a csv files without making a new script, opening excel, or
       | using a tui. I made tidy-viewer (tv) because current tools like
       | cat and column were not pretty enough.
       | 
       | tv modifies raw files in the following ways:
       | 
       | 1. NA detection and highlighting 2. Printing only significant
       | digits 3. Header and footer meta data
       | 
       | I have been using this a lot at work. There is a lot more work to
       | do, but it is in a usable state.
       | 
       | Give it a try! If you like it then star on Github!
        
         | mileza wrote:
         | I think it's a great first effort, but there are a number of
         | possible improvements to do. The most obvious one would be to
         | support passing the file as an argument instead of using cat or
         | the redirection operator every time. It's great that it works
         | with stdin to allow piping into it, but it's cumbersome if you
         | just want to take a file and print it, which will no doubt be a
         | common use case.
        
           | MobiusHorizons wrote:
           | Do you think 'tv <file.csv' does what you want well enough?
           | What is the behavior when you run 'tv file.csv' does it just
           | block waiting for input?
           | 
           | I think it's great for a visualizer like this to encourage
           | people to get used to the power of shell pipelines if
           | possible.
        
             | Someone wrote:
             | I think it's a bad idea to go against half a century of
             | conventions without good reason.
             | 
             | It's very surprising to see a tool that only works as a
             | filter, and doesn't take file paths as arguments.
        
             | GuB-42 wrote:
             | It works, but almost all UNIX commands that work on
             | pipelines can take a list of files as arguments. Out of the
             | commands I use regularly, "patch" is the only one that
             | works exclusively from stdin, probably because file
             | arguments have a different, somewhat obscure, and probably
             | historical meaning.
             | 
             | If appropriate, using files as arguments instead of using
             | shell pipelines is a best practice. Commands can optimize
             | for that use case, print better error messages, etc...
             | 
             | And it is _not_ a good thing to encourage useless use of
             | cat. If you goal is to show how your tool is to be used
             | with pipelines, show an actually useful pipeline for
             | example  "sed '1b;/abc/!d' file.csv | tv". The "sed"
             | command prints the first line (header), and all lines
             | containing "abc".
        
         | einpoklum wrote:
         | First of all - kudos on tackling this task - it is indeed very
         | annoying to get CSVs to render nicely on a terminal.
         | 
         | 1. How does tidy-viewer compare with csvlook?
         | 
         | 2. Looking at the demo video, there seems to be an odd fixation
         | with "N/A". The CSV spec, AFAIK, doesn't recognize this phrase.
         | I don't understand why someone would expect a quoted string
         | field whose raw characters are "n/a" should be rendered as
         | anything other than n/a (i.e. lowercase and without the
         | quotes). I'm guessing maybe in your workflow you want to use
         | that phrase a lot, but for a tool for the general public I'd
         | not do this kind of interpretation; and I would leave an empty
         | field as empty.
         | 
         | 3. tidy-viewer seems to require "unstable library features", or
         | at least ones which were unstable as of Rust 1.48.0 . It would
         | be nice if you could be compatible with older rust
         | distributions/versions.
         | 
         | 4. Many systems, especially older ones, especially ones which
         | you access remotely and don't have root privileges on, won't
         | have a rust installation. It would be even more convenient if
         | you could provide binaries with little or no extra dynamic
         | library dependencies, which could be used on older / rustless
         | systems. I realize this is a tall order, however.
         | 
         | 5. What about scrolling? The worst part of viewing CSVs is
         | having to handle wide ones which exceed the terminal width, and
         | having decent horizontal as well as vertical scrolling ability.
         | less doesn't cut it, because it doesn't keep the header row,
         | plus it doesn't recognize field widths.
         | 
         | 6. tidy-viewer does not seem to support wrapping longer fields
         | onto multiple terminal lines.
         | 
         | 7. When the user doesn't specify the color scheme, are you
         | choosing one based on the terminal colors, or are you using
         | absolute color values? I suggest the former.
         | 
         | 8. tidy-viewer loads and parses the entire CSV immediately;
         | and, in fact, seems to keep two copies of it in memory at once.
         | This means it cannot be used with large files without
         | thrashing; and even if your CSV does fit in global memory, it
         | will still be kind of unusable, trying to dump gigabytes onto
         | the terminal.
         | 
         | Bottom line: A nice initial effort, but the more serious
         | challenges are yet to be tackled, plus needs to be more
         | robustly cross-platform.
        
           | flusteredBias wrote:
           | First of all - kudos on tackling this task - it is indeed
           | very annoying to get CSVs to render nicely on a terminal.
           | 
           | > How does tidy-viewer compare with csvlook?
           | 
           | The most important issue to me is that csvlook is a much less
           | pleasant viewing experience, but there is also this
           | ...csvlook reads and parses all of the data. Try pushing
           | diamonds.csv to csvlook. When I do it on my machine it takes
           | 15.228 seconds while tv takes 0.0042 seconds. For this reason
           | tv is much faster, but speed is not the goal of the package.
           | tv's purpose is to maximize viewer enjoyment.
           | 
           | 2. Looking at the demo video, there seems to be an odd
           | fixation with "N/A". The CSV spec, AFAIK, doesn't recognize
           | this phrase. I don't understand why someone would expect a
           | quoted string field whose raw characters are "n/a" should be
           | rendered as anything other than n/a (i.e. lowercase and
           | without the quotes). I'm guessing maybe in your workflow you
           | want to use that phrase a lot, but for a tool for the general
           | public I'd not do this kind of interpretation; and I would
           | leave an empty field as empty.
           | 
           | I could not say it better than this:
           | 
           | > The norm of treating missing data as NA exists in R (which
           | the developer of this is clearly inspired by based on the
           | GitHub readme.). Pandas in Python is stuck with NaN for
           | numeric types (not quite correct) and "" or None for string
           | types. Personally I like the choice to both explicitly render
           | missing data in colour and to apply NA as a placeholder text
           | to display that colour.
           | 
           | 3. tidy-viewer seems to require "unstable library features",
           | or at least ones which were unstable as of Rust 1.48.0 . It
           | would be nice if you could be compatible with older rust
           | distributions/versions.
           | 
           | That is a good point. I also release binaries which I think
           | makes this requirement less needed. What are your thoughts.
           | 
           | 4. Many systems, especially older ones, especially ones which
           | you access remotely and don't have root privileges on, won't
           | have a rust installation. It would be even more convenient if
           | you could provide binaries with little or no extra dynamic
           | library dependencies, which could be used on older / rustless
           | systems. I realize this is a tall order, however.
           | 
           | With github actions I auto-build binaries for many OSes. See
           | https://github.com/alexhallam/tv/releases/tag/0.0.13
           | 
           | 5. What about scrolling? The worst part of viewing CSVs is
           | having to handle wide ones which exceed the terminal width,
           | and having decent horizontal as well as vertical scrolling
           | ability. less doesn't cut it, because it doesn't keep the
           | header row, plus it doesn't recognize field widths.
           | 
           | Scrolling is nice. To offer scrolling the only option I am
           | aware of is turning this _cli_ into a _tui_. I made the
           | choice early on to stay chose the more minimal path and stick
           | to a _cli_. The goal is to be a `column` replacement not a
           | spreadsheet replacement.
           | 
           | 6. tidy-viewer does not seem to support wrapping longer
           | fields onto multiple terminal lines.
           | 
           | The goal is to glance at the data as a whole not a cell or
           | fields. If there are cells with long text they get cut at 20
           | characters. I like this a lot. I would prefer to know that
           | there is a lot of text that I can dig into latter, but when I
           | am glancing at the csv I just want an overall picture. In my
           | view tables of data are data visualizations meaning that I
           | don't have to show everything to understand enough of it.
           | 
           | 7. When the user doesn't specify the color scheme, are you
           | choosing one based on the terminal colors, or are you using
           | absolute color values? I suggest the former.
           | 
           | Great question. I want to eventually add the ability for
           | users to make a config file will their own colors. At this
           | time I just have absolute presets. If you are interested I
           | would happily take a contribution that allows users the
           | option to configure tv with some dotfile.
           | 
           | 8. tidy-viewer loads and parses the entire CSV immediately;
           | and, in fact, seems to keep two copies of it in memory at
           | once. This means it cannot be used with large files without
           | thrashing; and even if your CSV does fit in global memory, it
           | will still be kind of unusable, trying to dump gigabytes onto
           | the terminal.
           | 
           | That is almost true. tidy-viewer reads the entire csv, but
           | only parses the head. If I knew of a way to get the number of
           | rows and columns of a csv without reading the whole file then
           | I would. I know there is a good deal more room for memory
           | optimization. This is not my strength and I am still
           | learning.
           | 
           | 9. Bottom line: A nice initial effort, but the more serious
           | challenges are yet to be tackled, plus needs to be more
           | robustly cross-platform.
           | 
           | Thanks for the compliment. It is still a work in progress.
        
           | notafraudster wrote:
           | The norm of treating missing data as NA exists in R (which
           | the developer of this is clearly inspired by based on the
           | GitHub readme.). Pandas in Python is stuck with NaN for
           | numeric types (not quite correct) and "" or None for string
           | types. Personally I like the choice to both explicitly render
           | missing data in colour and to apply NA as a placeholder text
           | to display that colour.
        
         | sneak wrote:
         | Your asciinema playback made me twitch. Is the lack of a
         | trailing space in your PS1 intentional?
        
         | IgorPartola wrote:
         | This looks great! I wonder how long it'll be until someone
         | posts a long ask snippet that will do something similar and
         | claim this isn't progress, but rest assured that they are
         | wrong. I'm adding tv to my toolbox.
        
           | flusteredBias wrote:
           | Thanks! I appreciate the compliment!
        
         | contravariant wrote:
         | The NA detection and higlighting is nice but I'm not sure how I
         | feel about showing anything other than the exact textual value.
         | I don't mind abridging quotes when they're not necessary, but
         | showing "N/A", NA,, etc. as the same value is a bit iffy.
        
           | psadauskas wrote:
           | When presented with a similar problem, I tend to use non-
           | ascii characters. For example, in my `~/.psqlrc` I have:
           | \pset null
           | 
           | Looks like this in output:                   40 |        |
           | 2021-09-23 20:42:32.536571 |          41 |      15 |
           | 2021-09-23 20:42:33.177474 |          42 |      19 |
           | 2021-09-23 20:42:33.212133 |          43 |        |
           | 2021-09-23 20:42:33.247346 |
        
             | a1369209993 wrote:
             | I was going compain about null bytes in text (never,
             | period), but then realized you actually did mean the U+2400
             | SYMBOL FOR NULL[0] character itself. That's surprisingly
             | viable (though you do now have to worry about the string
             | "\xE2\x90\x80" ending up in your data).
             | 
             | 0: Which is actually incorrectly named - it should be
             | "SYMBOL FOR NUL".
        
           | flusteredBias wrote:
           | It is rough. There are many ways that different tools put
           | NAs, na, N/A, "", etc. in a file. To chose only "NA" would
           | mean I would be excluding the output of other tools. I chose
           | accessibility over specificity. #trade-offs.
        
             | ayoubElk wrote:
             | Maybe you could just background highlight the empty cells?
        
               | flusteredBias wrote:
               | Well, I think being promiscuous with "NA", "N/A", nan,
               | etc. is a separate issue from a blank cell. A blank cell
               | is literally missing. That should be filled with NA.
        
               | nicoburns wrote:
               | > A blank cell is literally missing. That should be
               | filled with NA.
               | 
               | Why? "NA" stands for "Not Applicable", but a blank cell
               | in a CSV could represent any number of things of which
               | "Not Applicable" is only one.
        
               | flusteredBias wrote:
               | haha. You are right "NA" stands for "Not Applicable".
               | That is not always how people/programs using it though.
               | What are some alternatives that you would suggest? I am
               | happy to learn.
        
               | nicoburns wrote:
               | I would suggest similar to what other people have
               | suggested where you color the background of the cell red
               | and then just display the literal content of the cell. I
               | think it would be reasonable to have this configurable
               | via command line arguments though, so if you like the
               | "NA" that could also be a mode.
               | 
               | Perhaps it would make sense to have a "pretty" mode and a
               | "literal" mode (which would also turn off the clever
               | processing of numbers)?
        
             | eevilspock wrote:
             | Simple: provide CLI switches to let the user decide what
             | they want for NA detection (current behavior as default,
             | user can provide alternate NA values, per the source file
             | or the natural language it is expressed in), and how they
             | want them displayed, whether as-is, blank or a consistent
             | custom value (as-is should be the default).
        
             | contravariant wrote:
             | Fair enough but that doesn't explain why you chose to
             | display all of them as "NA". As you say there are lots of
             | different ones, hence it would be a bad idea to pick one as
             | the 'default' to display. To me it's important whether
             | something is missing, filled with "N/A", or "null", or "Not
             | Applicable" etc.
        
         | asicsp wrote:
         | Hey, if you are able to edit the title of this submission, can
         | you add `Show HN: ` prefix?
         | 
         | See https://news.ycombinator.com/showhn.html for details.
        
           | flusteredBias wrote:
           | Blast, I am sorry. It looks like I can't edit the title
           | anymore. Otherwise I would make the change.
        
         | moritonal wrote:
         | Man.. TV is such a good name for a visualiser tool that it'd be
         | excellent if it could pretty-print any content given to it.
         | 
         | Does your framework support the idea of pre-parsing the file's
         | content and selecting an appropriate renderer, or is it fairly
         | tied to CSVs?
        
           | inostia wrote:
           | I actually like `tidy` better and honestly would rather have
           | `cat a.csv | tidy ...`. But it's probably already a thing.
        
             | hnlmorg wrote:
             | Nice idea. I might actually work on this myself
        
           | flusteredBias wrote:
           | I would need more details on what you mean by "pre-parsing".
           | It works with any deliminator, it could be comma-separated,
           | pipe-separated, etc.
        
             | mixmastamyk wrote:
             | Doesn't need to be pre-parsed. Perhaps give the filename to
             | the utility instead of content via stdin. Then filename
             | gives a hint. If there is none, run "file filename" (via
             | library) beforehand.
        
         | berlinquin wrote:
         | Cool project! I'm familiar with column, and this looks like a
         | good replacement.
         | 
         | Curious, how do you handle formatting on cells with long
         | strings that need to overflow to multiple lines? As soon as you
         | try to optimize the column widths for table length, you start
         | hitting an NP-hard problem.
         | 
         | https://quintenkent.com/content/column-problem.html
        
           | flusteredBias wrote:
           | I actually read that article when I started making the
           | package. You can see some of the input data here
           | https://github.com/alexhallam/tv/blob/main/data/a.csv. I let
           | the user chose how long the max column width should me then
           | append "...". The default value is 20 characters.
        
       | TacticalCoder wrote:
       | It's very weird for a project made to "maximize viewer enjoyment"
       | to not put a space after the prompt. The one saved character on
       | the line is definitely not worth the illegible resulting line:
       | this doesn't maximize my enjoyment at all when viewing the
       | examples.
        
         | drcongo wrote:
         | Some might call this comment petty, but that was making my
         | brain itch too.
        
         | stavros wrote:
         | That's just the shell prompt on the recording machine. You can
         | use your own prompt in your shell.
        
         | hnlmorg wrote:
         | That's a really uncharitable comment considering it the
         | developers prompt and has nothing to do with `tv` aside from
         | appearing in the asciinema demo.
        
       | dotancohen wrote:
       | Very nice! How does it handle CSVs that are wider or longer than
       | the terminal? How does it deal with columns that are
       | exceptionally long, or multiline?
       | 
       | Often when working with large CSV files, I'll need to show or
       | hide specific columns, especially if they are very long. Also,
       | grepping the output for a specific line will hide the header as
       | well, not to mention make the output unnecessarily wide if non-
       | matching lines have longer fields than do the matching lines. So
       | a built-in grepping feature would make this very useful.
        
         | flusteredBias wrote:
         | > How does it handle CSVs that are wider or longer than the
         | terminal?
         | 
         | Columns that are exceptionally long but cutting and appending
         | an ellipsis if the line is over 20 characters.
         | 
         | > a built-in grepping feature would make this very useful.
         | 
         | see the following for csv data manipulation:
         | 
         | xsv - Command line csv data manipulation. Rust
         | 
         | csvtk - Command line csv data manipulation. Go
         | 
         | tsv-utils - Command line csv data manipulation toolkit. D
         | 
         | q - Command line csv data manipulation query-like. Python
         | 
         | miller - Command line data manipulation, statistics, and more.
         | C
        
           | dotancohen wrote:
           | Terrific, thank you!
        
       | rlue wrote:
       | Why not visidata?
       | 
       | https://www.visidata.org/
       | https://www.youtube.com/watch?v=N1CBDTgGtOU
       | 
       | (It does much, much more than pretty printing, but no reason you
       | can't use it for that.)
        
         | qwertyuiop_ wrote:
         | This is why I love HN. Never knew this existed. It has become
         | my favorite tool in the past 5 mins I installed it. Also
         | reminds me of Mainframe programs that I encountered in the
         | past. I wish we had more tools like this instead of electron
         | mouse click based apps for people who prefer speed and
         | keyboard.
        
         | flusteredBias wrote:
         | I love visidata! But when I want to just glance at a csv file I
         | reach for tv (I used to use `column` which is more of a tv
         | competitor than visidata). This is for a couple reasons.
         | 
         | 1. tv gives a quick summary of the count of rows and columns
         | 
         | 2. tv shows all columns at the bottom that don't fit in the
         | terminal. With vd I have to scroll on wide data.
         | 
         | 3. tv guides the eye to missing data better with NA highlights
         | 
         | 4. tv has sigfig logic that is better. I work with files where
         | the decimal dust can become long. Those unnecessary characters
         | pushes remaining columns off the screen. This means the user
         | would need scroll over to see additional columns. I generally
         | think it is better to avoid additional key presses if possible.
         | 
         | 5. tv is fast for large files. It does not have to read and
         | format all of the data like vd. tv is focused more on _looking_
         | at the file and not _operating_ on file. It does not have to do
         | as much as vd. That helps tv with what it is uniquely good at.
         | "Do one thing and do it well"
         | 
         | It does not matter if your file is really wide (lots of
         | columns) or really long -- tv will give the user a compact
         | useful pretty print of the data. Why not use vd as a TUI
         | spreadsheet and tv for glancing at csv files. They are both
         | great tools in my eyes with different purposes.
        
           | saulpw wrote:
           | Hey there, VisiData author here. Nice work with tv! I'm sure
           | it's more useful than VisiData for certain use cases. I just
           | want to clear some things up since there are a few
           | misconceptions here (which will happen if you don't use
           | VisiData a lot):
           | 
           | 1. In VisiData, The number of rows is always shown in the
           | lower right, and you can see the number of columns with
           | either Ctrl+G or a list of the columns with Shift+C. Or
           | Shift+I for the list of columns with summary statistics
           | (mode/distinct/errors/etc). This is an extra keystroke, but
           | the amount of data you can get with that keystroke more than
           | justifies it.
           | 
           | 5. VisiData will instantly open and show any file it can, and
           | continue to load the rest until it's done or you press Ctrl+C
           | (or quit). Everything in VisiData is lazily evaluated, so
           | it's not actually doing any more work than tv when you view
           | the first page of rows, and then you can see the next few
           | pages of rows with only one keystroke (PgDn, as opposed to
           | having to edit a command and rerun it). Fewer keypresses ftw!
           | 
           | A lot of people think VisiData is a TUI spreadsheet, but vd
           | is not a "spreadsheet" in the classic sense, as it's not
           | cell-based. Its primary use-case is exploring and wrangling
           | tabular data. It just turns out that this is what a lot of
           | people are doing with their spreadsheets, but they have to
           | bend over backwards to get Excel/whatever to play nice with
           | their data's structure. By the same token, if you try to do
           | little single-cell formulas in VisiData, it's going to be
           | quite difficult.
           | 
           | For people who like static binaries and only need to view a
           | few rows of CSV files, or produce part of a larger report in
           | a pipeline, tv could be a better fit than VisiData,
           | especially if it continues to be maintained. I'm always
           | excited to see new data tools in the terminal space!
        
         | certifiedloud wrote:
         | Visidata is a great interactive tool. TV seems like it would be
         | better when scripting, or in one-liners.
        
           | dotancohen wrote:
           | For scripting I would use grep and cut, maybe awk. For
           | scripting with CSV files, at least in my experience, you
           | usually want specific columns from specific lines.
           | 
           | If TV had a switch for specifying only certain columns, that
           | would make the job much easier.
        
             | flusteredBias wrote:
             | sounds like you are looking for xsv. I like that tool a lot
             | for selecting specific columns.
        
       | semireg wrote:
       | Consider adding a code snippet showing a.csv output. I had to
       | watch a video just to see text.
        
         | flusteredBias wrote:
         | I have some work to do on the README. I will show the output
         | better. The difficulty with showing the output only is that it
         | does not capture the coloring. Maybe I will show the output, or
         | add a picture, or have an animated gif. Maybe all three.
        
         | GrayShade wrote:
         | Not OP, but there's a screenshot under
         | https://github.com/alexhallam/tv#1-na-comprehension.
         | 
         | Which is actually worse for screen reader users, I suppose.
        
         | Someone wrote:
         | And a list of options. Reading the first 80 lines of
         | https://github.com/alexhallam/tv/blob/main/src/main.rs was in
         | some sense more educational than the readme.
         | 
         | It, for example, allowed me to make an educated guess as to the
         | answer to the question "how does this handle huge files?". It
         | by default only reads 25 lines.
         | 
         | (That makes the example from the header:                  cat
         | diamonds.csv | head -n 35 | tv
         | 
         | a bad example. You shouldn't need that _head_ in-between.
         | 
         | However, line 167 says
         | //.take(row_display_option + 1)
         | 
         | That seems to indicate this reads the entire file into memory,
         | and that guess wasn't that educated at all.
        
       | 6502nerdface wrote:
       | A while ago, Two Sigma Investments open-sourced its own curses-
       | based internal tool for pretty printing tabular data:
       | https://github.com/twosigma/ngrid
        
       | stavros wrote:
       | This is quite nice, but I don't like how it cuts off the output
       | (instead of making it scrollable). Also, why require the use of
       | `cat`? Accepting a filename so I can do `tv foo.csv` would be
       | much more ergonomic, in my opinion.
        
         | bityard wrote:
         | Why waste code teaching the program how to open files correctly
         | when the shell already knows?
         | 
         | tv < foo.csv
        
           | stavros wrote:
           | Because I'd rather the program did the work and let me skip
           | the extra keystroke.
        
       | GrayShade wrote:
       | Cool project (and written in my favourite language), but I really
       | hate how there's no space at the end of your prompt.
        
         | stavros wrote:
         | I have to agree, I felt dirty watching the recording. The
         | project looks great, though.
        
         | BiteCode_dev wrote:
         | It feels like when somebody opens a ( but doesn't close it.
        
           | hoosieree wrote:
           | )
        
             | BiteCode_dev wrote:
             | There is such a big latency in HN for self closing
             | parenthesis.
        
         | flusteredBias wrote:
         | Love the idea! Already opened the issue and will merge a PR
         | before the next release.
        
           | [deleted]
        
           | GrayShade wrote:
           | Ah, I didn't mean a newline at the end of the output. Just a
           | space after your shell prompt ($), see e.g.
           | `user@~/code/data$cat` in https://github.com/alexhallam/tv/bl
           | ob/main/img/column_v_tv2.....
        
             | flusteredBias wrote:
             | lol. Got it.
        
         | lambic wrote:
         | And the use of cat on a single file..
        
           | netcraft wrote:
           | I don't understand your comment - could you expand on it?
        
             | [deleted]
        
             | ComputerGuru wrote:
             | cat just regurgitates the contents of the file, but the
             | resulting piped fd is non-seekable. Since almost every
             | command that can operate on a file from stdin can also
             | operate on the file by name/path, at best this is just a
             | needless invocation of a process (i.e. `tv foo.csv` should
             | have been used instead of `cat foo.csv | tv`) - if the app
             | in question can't handle paths, then you can have the shell
             | pipe the file into it instead (e.g. `tv < foo.csv`). At
             | worst, the recipient program would need to buffer the
             | entire contents of the input if it needs to perform non-
             | sequential operations on the source data - this is the case
             | with things like `tac` that need to seek to the end of the
             | input (see https://github.com/neosmart/tac for how `cat foo
             | | tac` requires buffering but both `tac foo` and even `tac
             | < foo` don't).
        
               | netcraft wrote:
               | thank you! I knew some of that but learned a lot too.
        
               | mthoms wrote:
               | Google the phrase "useless use of cat".
               | 
               | To some, it's a faux-pas. Personally, I like the
               | aesthetics of cat for my own scripts. It follows the
               | "pipe flowing" idiom better.
               | 
               | There are performance reasons why "useless cat" should be
               | avoided though. So avoid it where performance is
               | important (or when some other hardcore CLI jockey is
               | going to see your code :))
        
               | hibbelig wrote:
               | I understand that people say you can replace
               | foo < bar
               | 
               | with                   <bar foo
               | 
               | but I've never been able to get myself to do this.
        
       | davidatbu wrote:
       | What is the value-add when compared to using `xsv` to pretty
       | print? Is it only the fact that it highlights NA values?
        
         | flusteredBias wrote:
         | xsv is one of my favorite data manipulation tools. Also, the
         | author of that package is one of the best developers I know. I
         | use xsv _with_ tv. I normally pipe the output of xsv to tv.
         | 
         | 1. As you noted NA comprehension 2. Column overflow logic for
         | different sized terminals 3. Summary meta data in the header 4.
         | Significant digits logic. This allows users to view more
         | columns than they would otherwise view due to decimal dust
         | shifting the columns over. 5. This is the most import! It looks
         | really pretty!
        
       | thamer wrote:
       | Some (most?) tools that output data in columns and fit each one
       | to the largest value in that column need to scan the whole file
       | as a first pass just to start displaying data.
       | 
       | Not only is it the case with this tool, but from what I'm reading
       | in main.rs it looks like it's also loading _the whole file_ in
       | memory. I was going to say that scanning the file was a deal-
       | breaker, but if true this is much more resource-intensive.
       | 
       | This looks like a nice tool, but these design choices seem to
       | limit its use to relatively small files. It could be updated to
       | have a read-ahead buffer instead and adjust its output as new
       | lines are discovered with values of different width, although
       | doing this without a jarring resize could be challenging.
       | 
       | Could someone with better knowledge of Rust than mine confirm
       | this?
       | 
       | I see the full dataset being loaded here[1] and the column widths
       | being computed here.[2]
       | 
       | [1]
       | https://github.com/alexhallam/tv/blob/main/src/main.rs#L183-...
       | 
       | [2]
       | https://github.com/alexhallam/tv/blob/main/src/main.rs#L218-...
        
         | flusteredBias wrote:
         | > these design choices seem to limit its use to relatively
         | small files
         | 
         | 1. As a rule-of-thumb, I have been working on functionality
         | before optimization. That said, `tv` is really fast. It is
         | completely false that `tv` only works for relatively small
         | files. I just pushed a 624MB file to `tv`. It ran in 2.8
         | seconds. With `column` it takes 5.0 seconds. Now, I would love
         | help from programmers smarter than me. I am sure there are a
         | lot of optimization gains to be had in `tv`. I just wanted to
         | make sure potential users are not misled. `tv` is performant.
         | 
         | > Some (most?) tools that output data in columns and fit each
         | one to the largest value in that column need to scan the whole
         | file as a first pass just to start displaying data.
         | 
         | > Not only is it the case with this tool, but from what I'm
         | reading in main.rs it looks like it's also loading the whole
         | file in memory.
         | 
         | 2. `tv` reads once, but parse partly. This means that it reads
         | the full file only to grab the number of rows. It only
         | parses(take) the first n rows.
         | 
         | https://github.com/alexhallam/tv/blob/b548f0d19f64438d53f732...
         | 
         | https://github.com/alexhallam/tv/blob/b548f0d19f64438d53f732...
        
           | TAForObvReasons wrote:
           | If the goal is to calculate the correct column width, you
           | have to do one pass through the data before writing the first
           | row.
           | 
           | If the file can be read multiple times (not a UNIX stream),
           | you can just read the file twice.
           | 
           | If the file is a stream, instead of retaining the entire
           | dataset in memory, you can write to a temporary file and re-
           | parse it after calculating the widths.
        
             | flusteredBias wrote:
             | The correct column width is calculated from the first n
             | rows not the full file.
             | 
             | A stream does not work for tv because a stream does not
             | know how many rows are in the file a priori. Displaying the
             | dimensions of the file is a priority for `tv`. I am very
             | happy with that trade-off. I would rather know the
             | dimensions of a file than have a file stream of unknown
             | dimensions.
        
               | adamdusty wrote:
               | If you did it the way he's talking about you would stream
               | through the file to find how many rows and write the file
               | as a temp file that you could re-parse for the actual
               | data.
               | 
               | I'm not saying you should or shouldn't, but your use case
               | doesn't bar you from using streams.
        
               | flusteredBias wrote:
               | I see. Thanks for the clarification.
        
         | eevilspock wrote:
         | I like this idea. I don't think it would be jarring if the
         | read-ahead buffer was a minimal number of lines, i.e. looking
         | like distinct pages. The default could be at least the line
         | height of the terminal, or some multiple.
         | 
         | There could be an option to redisplay the header row for
         | resized "pages".
         | 
         | There could be a CLI switch giving the user control, i.e. make
         | everyone happy.
        
           | killjoywashere wrote:
           | I think data scientists will recognize this problem, and
           | there's a well-used solution: .head()
           | 
           | Just show me the top 5 rows. That's all most people are
           | looking for.
           | 
           | cat data/a.csv | tv --head
        
             | franga2000 wrote:
             | > Just show me the top 5 rows. That's all most people are
             | looking for.
             | 
             | Is it? I'd wager that can't be more than half its use at
             | most. Accessing a specific section that could be at any
             | section of the file is very common in my experience, as is
             | truly random access. Both of these, as well as the first
             | few rows use case, are far better served by a page system.
        
             | fiddlerwoaroof wrote:
             | Or:                   head -n5 data/a.csv | tv
        
               | eli wrote:
               | Unless your csv has embedded line breaks
        
               | flusteredBias wrote:
               | Can you give an example of what you mean? If it breaks tv
               | then I would like to add it to the automated tests and
               | see if we can work on it.
        
               | eli wrote:
               | No sorry, I assume tv is fine. The problem is in assuming
               | `head -n5` gives you 5 rows and piping that into tv.
        
               | flusteredBias wrote:
               | Oo I see. Thanks for clarifying.
        
       | iso1210 wrote:
       | Demo looks great, alas the prebuilt binaries don't work
       | $ wget
       | https://github.com/alexhallam/tv/releases/download/0.0.10/tidy-
       | viewer       --2021-09-27 16:47:43--
       | https://github.com/alexhallam/tv/releases/download/0.0.10/tidy-
       | viewer       Resolving github.com (github.com)... 140.82.121.3
       | Connecting to github.com (github.com)|140.82.121.3|:443...
       | connected.       HTTP request sent, awaiting response... 404 Not
       | Found       2021-09-27 16:47:44 ERROR 404: Not Found.
       | 
       | The deb in https://github.com/alexhallam/tv/target doesn't exist
       | either
        
         | flusteredBias wrote:
         | 1. Looks like you are using 0.0.10 the current version is
         | 0.0.13
         | 
         | 2. I need to update the README the binaries are here
         | https://github.com/alexhallam/tv/releases/tag/0.0.13
         | 
         | I recently moved away from manually building binaries to
         | automated building for many architectures. I am still learning
         | how to use github actions to build for a matrix of
         | architectures. I am still learning.
        
       | leptoniscool wrote:
       | CSV is still a major part of the development ecosystem, amazing
       | that it has such staying power after all these years.
        
       | treve wrote:
       | I've made a Lotus 1-2-3 inspired CSV viewer for the terminal too.
       | Had big plans for it, but it's just a basic viewer now:
       | 
       | https://github.com/evert/csv123
        
         | flusteredBias wrote:
         | Keep at it. I think it looks great!
        
       | oaiey wrote:
       | There should be a new package next to the traditional gnu tools
       | containing the modern needed tools e.g. jq, curl or tv. Sometimes
       | i really miss the extended sw package on some machines?
        
         | flusteredBias wrote:
         | Sounds like a good "awesome" page on github.
        
       | gurgeous wrote:
       | Hey, great start. I spend half my day in CSVs and I am definitely
       | your target audience. Most of the time I use bat, visidata or
       | tabview. In many ways tabview is the best, though recently the
       | project has been abandoned.
       | 
       | tv looks excellent. Fun name. I think if you added a couple of
       | features it would ascend to my toolbox:
       | 
       | (1) scrolling (horizontal and vertical)
       | 
       | (2) better command line parsing. Running "tv" without stdin or
       | arguments should produce an error/help. Running "tv xyz.csv"
       | should read that file.
       | 
       | Good luck!
        
       | ComputerGuru wrote:
       | XSV [0] can also pretty-print (minus the colors), but that's just
       | the tip of the iceberg as far as what it can do. It's very handle
       | for quick statistical analysis of CSV input.
       | 
       | [0]: https://github.com/BurntSushi/xsv
        
         | flusteredBias wrote:
         | I love xsv! I mention in the readme that command line data
         | manipulation tools are great compliments to tv.
         | 
         | https://github.com/alexhallam/tv#tools-to-pair-with-tv
        
           | baggiponte wrote:
           | that's exactly the comment I was looking for! xsv is super
           | powerful and I think you might both draw inspiration from one
           | another. I read above that tv reads everything into memory:
           | maybe you can exploit some xsv tricks to avoid that. I feel
           | tv looks great to visualise the outcome at the end of a
           | pipeline, perhaps with xsv. I am no Ruby expert either, but
           | this can become a cool Homebrew binary: people on macOS will
           | use it too!
        
             | flusteredBias wrote:
             | I will add some Homebrew installation instructions. That is
             | now an open issue. I want this tool to be highly
             | accessible. Again, xsv is the best. I like the idea of
             | small utilities that specialize in a specific task.
             | 
             | From the Unix philosophy:
             | 
             | > Make each program do one thing well. To do a new job,
             | build afresh rather than complicate old programs by adding
             | new "features".
        
       | frankfrank13 wrote:
       | Brings a tear to my eye. Everything else I've used is so heavy-
       | handed, I just wanted the jq of CSV's
        
       | tambourine_man wrote:
       | This looks great.
       | 
       | How well does it handle the edge cases of CSV, like escaped
       | commas, quoted text, escaped quotes, and all that fun, fun stuff?
        
         | flusteredBias wrote:
         | If you come across an edge case that tv does not handle then
         | let me know. I will add a tests csv file as part of the current
         | portfolio of test csvs.
         | https://github.com/alexhallam/tv/tree/main/data
        
         | GrayShade wrote:
         | Pretty well, I'd guess. It uses a well-tested CSV library.
        
         | flusteredBias wrote:
         | I have been using tv now for a couple months at work. It has
         | been working well on the data I see. If you find edge cases
         | then please open an issue with an example csv.
        
       | claimred wrote:
       | Wanted to mention that Windows PowerShell supports pretty CSV
       | printing out of the box, like so                 Import-Csv
       | .\Levels.csv | Format-Table            Count Level elevation
       | Level name Name              Object type Unique ID       -----
       | --------------- ---------- ----              -----------
       | ---------       1     -600.0000000    Store      -0,600 - Store
       | Level       dc611fed-1783-d759-053a-b19848c51491       1
       | 2850.0000000    Store      +2,850 - Store    Level
       | c59f2ae4-0e94-6ea0-bd82-8306971e628c       1     3350.0000000
       | Roof       +3,350 - Roof     Level
       | 7b487ac2-e102-dc23-6ad9-81c39124de1d
        
         | slaymaker1907 wrote:
         | PowerShell is actually pretty good at manipulating CSV and
         | JSON. However, I would definitely recommend using v7 (i.e.
         | pwsh) since it has many improvements over v5 (default on
         | Windows). For example, Group-Object seems to be several orders
         | of magnitude faster using the latest version.
        
         | sixothree wrote:
         | There's also ConsoleGridView.
         | 
         | https://devblogs.microsoft.com/powershell/introducing-consol...
        
           | qorrect wrote:
           | Dang looks sick, wonder if I get it on *nix.
        
         | hnlmorg wrote:
         | Shameless plug, but so does my shell,
         | https://github.com/lmorg/murex                 $ open
         | test/example.csv | format generic       Login email
         | Identifier  One-time password  Recovery code  First name  Last
         | name  Department   Location       rachel@example.com  9012
         | 12se74             rb9012         Rachel      Booker     Sales
         | Manchester       laura@example.com   2070        04ap67
         | lg2070         Laura       Grey       Depot        London
         | craig@example.com   4081        30no86             cj4081
         | Craig       Johnson    Depot        London
         | mary@example.com    9346        14ju73             mj9346
         | Mary        Jenkins    Engineering  Manchester
         | jamie@example.com   5079        09ja61             js5079
         | Jamie       Smith      Engineering  Manchester
         | 
         | My shell also aims to have closer compatibility with POSIX
         | (albeit it's not a POSIX shell) so you can use all the same
         | command line tools you're already familiar with too (which, for
         | me at least, was the biggest hurdle in my adoption of
         | PowerShell).
         | 
         | It also supports other file types out of the box too. eg
         | jsonlines                 $ open test/example.csv | format
         | jsonl       ["Login email","Identifier","One-time
         | password","Recovery code","First name","Last
         | name","Department","Location"]       ["rachel@example.com","901
         | 2","12se74","rb9012","Rachel","Booker","Sales","Manchester"]
         | ["laura@example.com","2070","04ap67","lg2070","Laura","Grey","D
         | epot","London"]       ["craig@example.com","4081","30no86","cj4
         | 081","Craig","Johnson","Depot","London"]       ["mary@example.c
         | om","9346","14ju73","mj9346","Mary","Jenkins","Engineering","Ma
         | nchester"]       ["jamie@example.com","5079","09ja61","js5079",
         | "Jamie","Smith","Engineering","Manchester"]
        
       ___________________________________________________________________
       (page generated 2021-09-27 23:00 UTC)